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ON ECONOMETRIC INFERENCE 
AND MULTIPLE USE OF THE SAME DATA 


BENJAMIN HOLCBLAT AND STEFFEN GR0NNEBERG 

Abstract. In fields that are mainly nonexperimental, such as economics and finance, it 
is inescapable to compute test statistics and confidence regions that are not probabilis¬ 
tically independent from previously examined data. The Bayesian and Neyman-Pearson 
inference theories are known to be inadequate for such a practice. We show that these 
inadequacies also hold m.a.e. (modulo approximation error). We develop a general 
econometric theory, called the neoclassical inference theory, that is immune to this in¬ 
adequacy m.a.e. The neoclassical inference theory appears to nest model calibration, 
and most econometric practices, whether they are labelled Bayesian or a la Neyman- 
Pearson. We derive a general, but simple adjustment to make standard errors account 
for the approximation error. 

Keywords: Hypothesis testing; Confidence region; Estimation; Model calibration. 

JEL classification: Cl. 


1. Introduction 

By definition, in nonexperimental fields, new data cannot be generated. Consequently, 
it is inescapable to compute test statistics and confidence regions that are not probabilis¬ 
tically independent from previously examined data. By the Skorohod’s representation 
(1976), this practice is equivalent to using twice the same data, so that we call it mul¬ 
tiple use of the same The main objective of this paper is to develop a general 

econometric theory, called the neoclassical inference theory^th&t is adequate for multiple 
use of the same data m.a.e. (modulo approximation error). The Bayesian and Neyman- 
Pearson inference theories are not adequate for such a practice, even m.a.e. Thus, if we 
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^The Skorohod’s representation (1976) states that, for any two Borel random variables Y and Z, there 
exist a Borel random variable U independent from Y, and a Borel function h{.,.) such that Z = h{Y, U). 
Thus, if Y and Z are not independent, using first Y, and then Z is equivalent to using first Y, and then 
reusing Y with U. 

^There are two reasons for this name. Firstly, it is a classical theory, in the statistical sense of the term, 
i.e., in this theory, the unknown parameter 9q is not treated as a random variable, but as a constant. 
Secondly, it is neoclassical in the historical sense of the term: it seems to formalize underlying principles 
of work by classical authors (e.g., Laplace, 1812/1820, livre II, chap. 3; Fisher, 1925/1973, part V). 
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set aside approximation errors, for which we provide an adjustment, this paper elucidates 
why econometric inference is possible in helds that are mainly nonexperimental, such as 
economics and hnance. 

l. 1. Key idea. In Bayesian and Neyman-Pearson theories, test statistics and confidence 
regions are not independent from data as the former are functions of the latter even 

m. a.e. Then, if the same realized data are re-used to compute new test statistics or 
conhdence regions, distributions conditional on the previously observed statistics should 
be considered (e.g., Lehmann and Romano, 1959/2005, sec. 10.1 for Neyman-Pearson 
theory; Savage, 1954/1972, sec. 3.5 for Bayesian theory). We show that this conditioning, 
which is typically ignored in practice, is a challenge for Bayesian and Neyman-Pearson 
theories. The neoclassical inference theory circumvents this conditioning. The key idea is 
to use realized data to approximate the distribution of random variables, called generic 
proxies, that have the same unconditional distribution as data-based statistics, but are 
probabilistically independent from the realized data. 

Example. Let Xi,t := (77^)^^ be data that are assumed to be T i.i.d. (independent and 
identically distributed) random variables following a Gaussian distribution with mean 9q 
and standard deviation s, denoted M{9o, s). The realized data are := {Xt{u))l^.^ 

where u denotes an element of the sample space El. We want to make inference about the 
unknown parameter 9q = ]E(Xi) through its hnite-sample proxy, the average. Now, con¬ 
sider generic data X^.j. := that are independently generated by the same Gaussian 

distribution as the data Xi,t, i.e., Xi.j- and are independent, but have the same un¬ 
conditional distribution. Then, the average of the data X^, denoted Xt '■= p 
and the average of the generic data, denoted X^, have the same unconditional distribu¬ 
tion, A/'(6'o, ^), and are equally informative about ^o- Nevertheless, previous knowledge 
of the realized data Xi,r(a;) typically affects the distribution of Xt, but does not affect 
the distribution of the generic proxy X^. For example, if the realized average Xt{oj) is 
known from a previous study, there is no uncertainty about it, so that its distribution is 
a Dirac at the realized average (i.e., '''^hile the distribution of X^ is still the same 
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Gaussian distribution, Af{9o, Thus, the idea of the neoclassical inference theory is to 
rely on an approximation of the distribution of to make inference about 6o. Although 
the data Xi-t are independent from the generic proxy X^, their observation provides an 
approximation of its distribution. For example, by the Lindeberg-Levy CLT (central limit 
theorem), a Gaussian distribution centered at the realized average, Xt{u}), with standard 


deviation 




Vt 


Vt 


A^, which is Af{6o, ^). 


, is an approximation of the distribution of 

o 


A similar idea is present in Monte-Garlo simulation methods: The observation of real¬ 
ized random variables enables the approximation of the distribution of a generic random 
variable, which is independent from the realized ones. In fact, this similarity has a math¬ 
ematical underpinning under the standard assumption of ergodicity, which stipulates an 
equivalence between exploration of the sample space and exploration of the time dimen¬ 
sion. 

In a way, the neoclassical inference theory generalizes the immunity of the standard 
justihcation of point estimators, consistency, to conhdence intervals and tests. Unlike the 
Neyman-Pearson justihcations for tests and conhdence regions, consistency is immune to 
multiple use of the same data m.a.e. A consistent point estimator of a parameter 6q does 
not depend on the realized data m.a.e.: by the dehnition of consistency, for almost all 
possible realizations of the data, such a point estimator is arbitrary close to the hxed 
parameter Oq m.a.e. See Appendix [A| on p. |^for a formal statement. 


1.2. Literature overview. In the statistical and econometric literature, the issue raised 
by multiple use of the same data for Neyman-Pearson and Bayesian inference theories 
has been occasionally discussed. E.g., Lehmann and Romano, 1959/2005, sec. 10.1 for 
Neyman-Pearson theory; Berger, 1980/2006, pp. 112-113 and 284 for Bayesian theory; 
Learner, 1978, pp. v-vii for an assessment of the acuteness of the issue, and chap. 9 for 
an ad hoc proposal to mitigate the issue for Bayesian inference. The common wisdom 
seems to be that the issue is unavoidable. To the best of our knowledge, no general 
formal solution has been proposed even m.a.e. Holcblat (2012) relies on the idea behind 
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the neoclassical theory only in the particular case of the empirical saddlepoint (ESP) 
approximation. 

Multiple use of the same data is not treated in the large literature about multiple hy¬ 
pothesis testing (e.g., Lehmann and Romano, 1959/2005, chap. 9 for a perspective a la 
Neyman-Pearson; Berger, 1980/2006, chap. 7 for a Bayesian perspective). In this litera¬ 
ture, it is assumed that the set of all statistics to be potentially computed is determined 
before examination of the data. This situation does not correspond to nonexperimental 
helds as their evolution is often the result of a hard-to-predict dialogue between theory 
and empirical studies based on more or less the same realized data. For example, Com- 
pustat, CRSP (Center for Research in Security Prices) and BEA (Bureau of Economic 
Analysis) data have been re-used in numerous empirical studies in corporate hnance, asset 
pricing and macroeconomics, respectively. 


1.3. Organization of the paper. The paper is organized as follows. Section and 
respectively, show that Neyman-Pearson and Bayesian inference theories are inadequate 
for multiple use of the same data even m.a.e. Section presents elements of the neoclas¬ 
sical theory, and proves its immunity to multiple use of the same data m.a.e. Section 
revisits model calibration and prominent econometric practices from a neoclassical point 
of view, and presents a simple adjustment to make standard errors account for the ap¬ 
proximation error. An important point to note is that this standard-error adjustment 
holds under the usual V^-asymptotic normality assumption, thus applying to a large 
part of econometric practice. Some readers may hnd sections and obvious, but the 
latter should be considered in comparison with section The contribution of this paper 
is essentially theoretical, and not mathematical. Applied econometricians might want to 


focus on subsection 5.3, which assesses the most common econometric practice from a 
neoclassical point of view. 


Remark 1. In accordance with the main objective of this paper, we reason m.a.e. in sec¬ 
tions Hi For the Neyman-Pearson theory, ignoring approximation errors means that we 
consider the asymptotic limit superior (or limit inferior) of the outer (or inner) probability 
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distribution to be exact for the given sample size, when the sampling distribution is not 
available. For the Bayesian theory, this has no bearing because probability distributions 
are assumed to be perfectly known by the econometrician (e.g.. Savage, 1954/1972, pp. 
59-60). For the neoclassical inference theory, this means that we consider the approxima¬ 
tion of the sampling distribution of the generic proxy to be exact. Unlike sections ii 
sectionj^does not reason m.a.e., and treats of the approximation error from a neoclassical 
point of view. o 


2. NEYMAN-PEARSON theory and MULTIPLE USE OF THE SAME DATA 


In this section, we explain the m.a.e. inadequacy of the Neyman-Pearson theory for 
multiple use of the same data. Subsection |2.1| informally explains it in the standard case 
of asymptotic f-statistics. Subsection 2.2 [formalizes it in the general case. 


2.1. The case of asymptotic f-statistics. Asymptotic t-statistics are among the 
most widely-used statistics to compute conhdence regions or carry out hypothesis tests. 
The Neyman-Pearson theoretical justihcation of an asymptotic t-test of size a is that the 
f-statistic has a probability 1 — a m.a.e. to be between the a/2 and 1 — a/2 quantiles 
of a standard Gaussian distribution under the test hypothesis. However, once computed, 
the f-statistic is in the non-rejection region with probability 0 or 1, i.e., it is or it is not 
in the non-rejection region. Thus, if the result of this hrst test leads the econometrician 
to compute a second f-test of size a, the corresponding t-statistic cannot typically have 
a probability of 1 — a m.a.e. to be between the a/2 and 1 — a/2 quantiles of a standard 
Gaussian distribution under the test hypothesis. The observation of the hrst f-statistic 
has removed a part of the randomness of the second f-statistic. Except in a few cases (e.g., 
Gourieroux and Monfort, 1989/1996, chap. 19), f-statistics computed on the same data 
set are not independent. This means that the Neyman-Pearson theoretical justihcation 
does not hold for the second f-test. Because of the duality between hypothesis testing 
and conhdence regions in the Neyman-Pearson theory, there is the same concern for 


conhdence intervals based on f-statistics. Subsection 2.2 proves that this concern about 
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the Neyman-Pearson theoretical justihcation of conhdence regions and tests is not limited 
to t-statistics. 

2.2. The general case. Assnmption [T] sets np the minimal elements of the Neyman- 
Pearson theory that are necessary to formalize mnltiple use of the same data. 

Assumption 1. (a) Let {fl,£n) be a measurable space where £q, denotes a a-algebra of 
f2. (b) Let 6o & & be the unknown parameter, where © denotes the parameter space, (c) 
Let Xi,T be some data, i.e., a measurable mapping from (17, to a measurable space 
where T and denote the sample size and the observation space, respectively. 

Remark 2. There is no restriction on the parameter space 0, so that it can be Euclidean 
or inhnite-dimensional. o 

De£nitions[^and[^recall the definition of Neyman-Pearson confidence regions and tests. 

Definition 1 (Neyman-Pearson confidence region). Let a G [0,1], and P a probability 
measure on (17, £ii). Under F, al — a Neyman-Pearson confidence region Ci-a,T is a mea¬ 
surable random subset of the parameter space that has a probability of at least 1 — a m.a.e. 
to contain the unknown parameter Oq, i.e., (i) for all a; G 17, Ci_a,T{Xi.,T{oj)) C ©, (ii) 
{xi-T G : Oq G C*!—Q:,r(Ti:r) } ^ oa.a.e., and (Hi) P \oj G 17 ; 0q G ^ 

1 — 0 m.a.e. 

Definition 2 (Neyman-Pearson test). Let B. be a test hypothesis, and P a probability mea¬ 
sure on (17,To). Define the measurable decision space (D,P(D)) where D := {dYi,dpf\, 
and P(D) denotes the power set o/D. The decisions dn and d^, respectively, correspond 
to the non-rejection and the rejection of the test hypothesis H. Under P, a Neyman- 
Pearson test of level o G [0,1] is a decision rule dri-) that leads to the rejection of H 
with a probability of at most a m.a.e. under H, i.e., a Srp/V(D)-measurable function dx 
m.a.e. s.t. P{dT{Ni,T) = c^a) ^ « m.a.e., i/H is true. 

Remark 3. As indicated by the qualihcation “m.a.e.,” when no hnite-sample distribution 
is available, we consider the asymptotic limit superior (or limit inferior) of the outer 
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(or inner) probability distribution, (see Remark on p. |^. Thus, our setup covers 
asymptotic Neyman-Pearson tests and confidence regions, and the case a la Hoffmann- 
Jprgensen (see Wellner and van der Vaart, 1996), in which finite-sample statistics are not 
measurable although their limit is measurable. In the latter case, in Definitions and 
P(6*o G C\-a,T{Xi,T)) and P((iT(-^i:T) = ^a), respectively, stand for liminfT-s .00 P*(6*o G 
Ci-a,T(-^i:T)) and limsup 2 ._,.f^ P*((iT(-^i:r) = dh) where P* and P*, respectively, denote 
the inner and outer probabilities implied by P. o 

Theorem formalizes the concern raised by previous knowledge of realized data that 
are not probabilistically independent from Neyman-Pearson confidence regions and tests. 

Theorem 1 (Neyman-Pearson inadequacy). TefP he the unknown probability measure on 
(f2,£^n), and {Xi,t G At} E a nonzero-probability event, i.e., P(Xi:r G At) = c > 0. 
For all E e £, define F{E\Xi,t G At) := ’ denote a Neyman-Pearson 

1 — a confidence region for 9 q under P with Ci_a,T, and a Neyman-Pearson test of level 
a under P with d{.). Under Assumption^ 

i) if {Xtt G At} and {6*0 G C'i_Q,,T(-^i;r)} are not independent m.a.e., then 

P(6'o G C'i_q^t(-^1:t)|-^1:T ^ At) 7^ P(6'o G Ci_a,T{Xi-,T)) m.a.c.] 

ii) if {Xi,T G S} and are not independent m.a.e., then 

P((i'p(J^i:j') = d\\Xi.T G At) P(d7’(J^i:j') = dpfi) m.a.e. 


Proof. It is definition chasing, essentially. For (i) and (ii), respectively denote {Oq G C'i_q,^t(-^i 
and {dT{Xi,T) = ^a} with E. By definition of independence between events, m.a.e., 
P(E n {Xtt G At}) nE)mi:T e At) ^ ^ nE) ^ HEIX^.t G 

At) P(-F), where equivalences can be seen as follows, (a) By assumption, P(Xi:r G 
At) >0. (h) By definition, ¥{E\Xtt G At) := Appendix B on p. 44 

for more details regarding the possible approximation error. □ 


The key defining properties of a Neyman-Pearson confidence region and test are, re¬ 
spectively, the probability that the confidence region contains the unknown parameter. 
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and the probability of rejecting the hypothesis under the test hypothesis. Theorem 
proves that these key defining properties are affected by the previous observation of a 
nonzero-probability event {Xi-t G At} that is not probabilistically independent from the 
corresponding conhdence region and test. Before the observation of G At}, the 

probability P of the Neyman-Pearson Definitions [T] and is P m.a.e., but, after observa¬ 
tion of {Xi-T G At}, P is P(.|Xi. 7 ’ G At) m.a.e. 

A solution would be to systematically account for previous knowledge of the data by 
determining conditional probability, such as P(.|Xi.'r G At)- However, most of the time, 
this is operationally impossible, especially in nonexperimental helds. In nonexperimen- 
tal helds, this previous knowledge can correspond to computed statistics or plots, but 
also to historical events personally experienced or studied. For example, dehning valid 
Neyman-Pearson tests or conhdence regions for an applied American econometrician who 
studies the US economy appears an impossible task. Moreover, even if it was possible, 
it would make criteria of validity of statistical discoveries path-dependent, and thus diffi¬ 
cult to understand. Therefore, the Neyman-Pearson inference theory appears practically 
inadequate for nonexperimental helds. 


Remark 4. Because of our focus on multiple use of the same data, in this paper, we 
present the operational impossibility to condition on previous knowledge of the realized 
data as the source of the Neyman-Pearson inadequacy. In fact, if one makes the distinction 
between unknown and random quantities as the Neyman-Pearson theory does (e.g., 9q is 
unknown, but constant), it is the realization of the data and not the knowledge of them 
that matters. Thus, one needs to condition on all the data that have been realized prior 
to the determination of the test statistics and conhdence regions to be computed. When 
only part of the data at use have been previously realized, we are back to Theorem [T] 
and the generic operational impossibility to determine conditional probability. When all 
the data at use have been previously realized, the conditioning is trivial, but then tests 
should should have zero probability type I error m.a.e., and, under additional but general 
assumptions, Ci_a,T{Xi:T) = © m.a.e. P-a.s. See Appendix [C| on p. 45 


o 
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3. Bayesian theory and multiple use of the same data 

“In a strictly logical sense, this criticism of (practical) prior dependence on the data cannot be 

refuted.” 
Berger (1980/2006, p. 112). 


As for Neyman-Pearson theory, multiple use of the same data is a challenge for Bayesian 


inference theory. Subsection 3.1 explains the concern in the basic case in which Bayes’ 


formula holds, and subsection 3.2 formalizes it in the general case. 


3.1. The basic case. Taken literally, Bayesian theory regards inference as a two- 
stage game between nature and an econometrician (e.g., Ferguson, 1967; Borovkov, 1984/1998). 
In the hrst stage, nature draws the parameter according to a prior distribution vr6i(,(.), 
and then draws data Xi,t according to a conditional probability density function (p.d.f.), 
second stage, the econometrician makes inferences about the real¬ 
ized parameter value 6q given the sample at handj^ As usual in game theory, the p.d.f. 
7rxi.r|6»o(-l-) and 7r6)(,(.) are common knowledge. Thus, the econometrician updates the prior 
distribution, vr6i(,(.), thanks to data according to Bayes’ formula 


T^9o\Xi.,t {0\Xi-,t{uj)) 


XXi,T\dQ{^l:T{(^)\0)xeo{0) 


to obtain the posterior distribution ■ 

But, after the p.d.f. of the unknown parameter given the data has been computed, the 
data are known and hxed. Their randomness has disappeared. Thus, the econometrician 
cannot learn anymore from them. If the Bayes formula is applied a second time to the 
same data, the p.d.f. of data conditional on the unknown parameter is one, so that 
the second posterior is equal to the hrst posterior. Mathematically, Bayesian updating 


^To avoid additional notations, the random parameter is defined as the identity mapping on the 
parameter space ©, so that its realized value is also denoted by 9q. 
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becomes 


_ I X'Kef,\Xi,T{0\Xi.,T{oj)) _ ^ 

“ ~r~\ - ThX? —“ T^eo\x^,T\^\Xi.,T[uj)). 

J©1 X 7reo|Xi,T(6'|Xi:T(a;))/i(d6') 

Therefore, Bayes inference theory cannot justify multiple use of the same data. Subsection 


3.2 shows that the conclusion remains unchanged in the general case, in which densities 


do not necessarily exist. 


3.2. The general case. Assumption [^defines the general structure of Bayesian inference 
along the lines of Florens, Mouchart and Rolin (1990) 


Assumption 2. (a) Let (f2 x ©,£^n (8)£^©,n) he a probability space, where (0,£^©) is the 
parameter space, and where Tn ® £^© denotes the product a-algebra of the a-algebras 
and £@. (b) Let {Xn,n}n>Q ® filtration in £^. Define a filtration {Xn\n^Q in £o, ® T© 
s.t., for all n eN, Xn = Xn,n ® {©, 0 }- 


The filtration corresponds to the accumulation of information that comes from 

the sample space. In other words, Xn is the information set of the econometrician after n 
Bayesian updates. Definition reminds the general definition of posterior probabilities. 

Definition 3 (Posterior probability). For a// n G N and B G {r2,0} ® £&, the Xn- 
posterior probability of B is the expectation of the indicator function 1^ conditional on 
Xn, i.e., ]E(l 5 |.T fi). 

Implicitly, Definition also defines priors as the distinction between a prior and a 
posterior depends on the update of reference. After n Bayesian updates, E(.|J^„) is the 
prior, while E(.|J^„+i) is the posterior. 


Remark 5. The framework is general: We do not impose restrictions on the parameter 

space 0, or require the existence of regular conditional probabilities. o 

^The main difference between their notations and our notations is the following. Unlike them, we do 
not identify u-algebras with their inverse image by the coordinate map. See Florens, Mouchart and 
Rolin, 1990, p. 11, warning. Our choice makes the presentation less elegant, but it allows us to maintain 
notational consistency within this paper. 
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Assumption specify the minimal additional ingredients necessary to study multiple 
use of the same data. 

Assumption 3. (a) Let Xi-t he some data, i.e., a measurable mapping from {^,£) to 
the measurable space {Srp,Srp), where Sj. denotes the observation space, (b) There exists 
ui G N s.t. Xni+i = Xni V cr(Ai: 7 ’), whcrc ct(Ai:t) denotes the a-algebra generated 
by Xi:T, and Xm V [a^Xi-w) ® {©,0}] the a-algebra generated by the union of and 

[a{X^.,T) ® {©,0}], ^.e., Xn, V [a{X^..T) ® {©,0}] := U [a{X^..T) ® {©,0}]). 

Assumption [^a) requires the existence of data, Xi,t, while Assumption |^b) requires 
that update ni comes from the use of the data Xi-x ■ Theorem formalizes the effect of 
a second use of the same data Xi-^t- 

Theorem 2 (Bayesian inadequacy). Let n 2 G [ni + 1, cxd[ s.t Jm 2 +i = Xn2 V [ai^Xi-w) ® 
{©,0}]. Then under Assumptions^ and the Xn2+i-posterior probability and Xn2- 
posterior probability are egual, i.e., for all B G {^2,0} ® £&, 

lE(ls|J^n2+l) = lE(lB|dM2)- 

Proof. It is dehnition chasing. By dehnition, Xn2+i •= ^n2 V [ai^Xi-w) ® {©, 0}] := cr{Xn2 U 
[a{Xi.,T) ® {©,0}]) = a{Xn2) — J^n2, where equalities can be seen as follows, (a) By 
Assumption l^b), [a{Xi,T) ® {©,0}] C Xn 2 - (b) Xn 2 is itself a a-algebra by dehnition of 
a hltration. □ 

The update ni -f 1 corresponds to the hrst use of the data Xi,t, while the update 
n 2 -|- 1 corresponds to the second use. Theorem proves that the second use of the same 
data does not increase the information set, Xn2i and thus the posterior remains the same. 
An immediate corollary of this result is the absence of formal Bayesian justihcation for 
analyses, in which an econometrician claims to have obtained a different “posterior” after 
a hrst use of the same data. Theorem shows that such analyses are incompatible with 
Bayesian inference theory. 
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Remark 6. For brevity and relevance, we mainly consider the Neyman-Pearson and 
Bayesian inadeqnacy when there is partial, and complete previous knowledge of the re¬ 
alized data, respectively. However, in parallel, complete, and partial previous knowledge 
causes Neyman-Pearson and Bayesian inadequacy, respectively. Complete previous knowl¬ 
edge causes Neyman-Pearson inadequacy when the randomness needed to justify a new 
test or a new confidence region has completely disappeared. For example, when one 
wants a t-test of size a after computation of a 1 — a confidence interval based on the same 
statistic, the t-statistic is between the a/2 and 1 — a/2 quantiles with probability 0 or 
1 because a conhdence interval corresponds to the set of hypothesis that would not have 
been rejected. See also Proposition on p. which can be seen as a formalization of 
the case, in which all the data have been previously examined. Partial knowledge causes 
Bayesian inadequacy when it is impossible to incorporate previous information through a 
formal Bayesian updating. For example, Bayesian inference theory is typically inadequate 
when partial previous knowledge corresponds to historical events personally experienced. 
See also Savage (1954/1972, pp. 59-60) for more details about this inadequacy. o 

As Neyman-Pearson theory, Bayesian inference theory is inadequate for multiple use 
of the same data. Both theories rely on a randomness, which is disappearing as data 
are used, so that multiple use of the same data appears difficult to justify. However, in 
nonexperimental helds, multiple use of the same data is inescapable. Thus, the relevance of 
Neyman-Pearson and Bayesian inference theories to nonexperimental helds is not obvious. 


4. Elements of neoclassical inference theory 


The purpose of this section is to introduce elements of a theory that is immune to 
multiple use of the same data m.a.e., and that provides a common framework for point 
estimation, conhdence regions and hypothesis testing. There does not seem to exist such 
an inference theory in the literature. 


This section is organized as follows. Subsection 4.1 presents the main idea of the 


neoclassical theory, subsection 4.2 its main elements, and subsection 4.3 proves that it is 
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theoretically immune to multiple use the same data m.a.e. Hereafter, for simplicity, we 
only consider the parameter space © to be an Euclidean space. 

4.1. Main idea. The setup of the neoclassical inference theory is standard (e.g., Borovkov, 
1984/1998, chap. 2 ). An econometrician wants to infer a constant and unknown param¬ 
eter 9o of an econometric model (r2,Tn,P). The parameter Oq is assumed to belong to a 
known parameter space, denoted ©. The only difference between the unknown parameter, 
do, and other elements of the parameter space © is that the former one equals a mapping 
of the generating probability measure, P, i.e., 

Oo =: G'(P) (1) 

where G(.) maps probability measures to elements of the parameter space ©. The econo¬ 
metrician does not know P, but has access to some data Xi,t ■= that are assumed 

to be generated by the econometric model (r 2 ,Tf 2 ,P). Thus, the econometrician approxi¬ 
mates 6*0 using the data Xi,t, i.e., dehnes a proxy 

9*^ := Ht{X,.,t) ( 2 ) 

where Ht is a mapping from the observation space to the parameter space ©. Often, 
Ht{Xi,t) = G(Pxi.r), where Pxi-t •= ^ is the empirical measure with 5xt 

denoting the Dirac measure at X^. We call a hnite-sample proxy of the unknown 
parameter 6 *o. 

Now, if there exist some data X^.rp with the same unconditional distribution as Xi,t 
(i.e., PoXbr = PoX*.p“^) but independent from them, these data induce an equally 
informative hnite-sample proxy of 9o 

9*r, ,= Ht{XI,t)- (3) 

Building on this remark, the idea of the neoclassical inference theory is to base inference 
of 9o on an approximation of the distribution of a generic hnite-sample proxy of 6 *o that 
has the same unconditional distribution as 9^, but is independent from the data Xi-t- We 
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denote the generic finite-sample proxy with 9^. In this paper, we take the finite-sample 
proxy (i.e., the choice of as given, in order to stay away from the qnestion of 

the properties of the generic proxy 6^* 


Example, (continned from p. |2|. From eqnation (1), 9q = J^a;exp[—| ]dx 


= ^Xi(a;)P(dn;) =: G'(P). From eqnation (2), 9*^ := Ht{Xi,t) := = Xt- 


From equation (|^, 9^ := Xj,. Assume that, as previously mentioned, the asymptotic 
Gaussian approximation is used to approximate the distribution of 9^. Then, the distri¬ 


bution of the generic finite-sample proxy 9^ is A/" IXt{oo), ^ 


sriXi-rjuj)) 


m.a.e. 


4.2. Definitions. We require the following assumptions, in addition to Assumptionto 
outline the neoclassical theory. 

Assumption 4. (a) Let P be the unknown probability measure on (f2, S). (b) Let (©, T©) 
be a measurable space s.t. & is a Borel subset of with p G N \ {0}, and T© denotes 
the Borel a-algebra on ©. (c) Let U be a uniformly distributed random variable with 
support [0,1] on the probability space, (f2,Tn,P), s.t. X\.,t and U are independent for all 
re [i,oo[. 

Assumption |^c) is the only assumption that is new with respect to Neyman-Pearson 
theory. Nevertheless, its novelty is limited as econometric reasoning (e.g., asymptotic 
theory) and implementations of the Neyman-Pearson theory (e.g., bootstrap) often im¬ 
plicitly require it. The random variable f/ is a randomization device that ensures the 
existence of the generic finite-sample proxy 9^. More generally. Assumption |^c) ensure 
the existence of a countable number of random variables with any probability distribution 
(e.g., Kallenberg, 1997/2002, Lemmas 3.21 and 3.22). Such an assumption is innocuous. 
We can always redefine the probability space (r2,Tn,P) as the product of an original 
probability space with the probability space ([0,1], i3([0,1]), A), where i3([0,1]) and A, 

^This question, which can be seen as one of the main topic of the statistical and econometric literature, 
corresponds to the study of the properties of what is called the estimator in the Neyman-Pearson theory. 
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respectively, denote the Borel cx-algebra on [0,1] and the Lebesgue measure (e.g., Kallen- 
berg, 1997/2002, pp. 111-112). 

Thanks to Assumption |^c), given a (data-based) finite-sample proxy of 6 *o, Lemma 
proves the existence of a corresponding generic proxy of Oq. 

Lemma 1 (Existence of a generic proxy). Let the finite-sample proxy of 9 q he Of. := 
where Ht is a measurable mapping from to (©,£^©). Under As- 

sumptions\^and^ there exists a corresponding generic finite-sample proxy of 9 q, i.e., a 
measurable mapping 9 ^ from (f2,£^n) to (0,T©) that is independent from Xi,t, but has 
the same unconditional distribution as 9f. 


Proof. This is an application of a known generalization of the standard inverse trans¬ 
form method that is used to simulate random variables from a uniform distribution (e.g., 
Kallenberg, 1997/2002, p. 56, Lemma 3.22). By Assumption |^b), © is a Borel space, 
i.e., there exists a Borel measurable bijection h : © —)■ A, with A G i3([0,1]), s.t. 
h~^ is also measurable. Denote the c.d.f. of h{9fi) with and put ^(m) := 

inf^ G A : ^ m}, Vm G [0,1]. Then, under Assumption |^a), for all B G 

nh-\F-^eiP)) € B) 6 h(B)) S P(ft(«5.) € h(B)) i P(«5. € B), where 

(a) and (c) are a consequence of the bimeasurability and bijectivity of h(.), and (b) is an 
application of the standard inverse transform method by Assumption |^c). Now, again 
by Assumption ^c), U is independent from data. Thus, put 9^ := h“^(F/)g,^(f/)). □ 

In this paper, the definitions of neoclassical estimators and confidence regions and test 
are based on the generic proxy. To simplify their statements, we require the following 
assumption. 


Assumption 5. Denote the Borel a-algebra on R with BfR). There is a £@/£-R_-measurable 
p.d.f. /e*(-) s.t., for all B G £&, P(6'^ E B) = Jg /e* (6*)/i(d6*) m.a.e., where pi is the 
Lebesgue measure X, or the counting measure v. 

By the Radon-Nikodyn theorem. Assumption requires the distribution of the generic 
finite-sample proxy or its approximation to be a probability measure dominated 
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by the Lebesgue or the counting measure. In practice, because P o is unknown, it 
requires the approximation of P o to be a probability measure dominated by the 
Lebesgue or the counting measure. Instead of requiring the existence of fri-), we could 
apply the Lebesgue decomposition theorem to write m.a.e. the measure P o as the 
sum of a continuous, a discreet and a singular measure. However, it would complicate 
the upcoming dehnitions without much tangible gain. In particular, under Assumption 
1 ^ the neoclassical estimator is simply a maximizer of the p.d.f. /g* m.a.e. 

Definition 4 (Neoclassical estimator). A neoclassical estimator, denoted 9 t, is a maxi¬ 
mizer of the p.d.f. m.a.e., i.e.. 


9t G argm^/ 0 ^( 6 ') m.a.e. 


Example. (continued)P( 6 'f Gi?r-( 6 *)) = 


[st(X-1:t{ijJ)) / Br{9) 


In 


exp 


e-XriA Y 


‘^\st(X\.t(A)/AT ) 


d 6 ' m.a.e. 


Thus, the neoclassical estimate is the mode of TV iXTioo), Qrp = Xt{i.o) 


m.a.e. 


By Definition a neoclassical estimator is an element of the parameter space © that 
has the highest probability density to be the generic hnite-sample proxy 9^ m.a.e. Thus, it 
is a maximum-probability based estimator. In the neoclassical theory, conhdence regions 
are also maximum-probability based. 

Definition 5 (Neoclassical confidence region). Denote the support of fe^ with supp(/ 6 i»), 
%.e., supp(/ 0 ^) := {0 G 0 : /e* (0) > 0}. A -measurable set, Ri_a,T, is a neoclassical 
confidence region of level 1 — a with a G [0,1] if, and only if. 


Ri-a,T = {0 e supp(/ 0 .) : f0^^{9) ^ ka,T} m.a.c., 


where 


ka,T '■ supfcgj^ I A; : 




fe-(9)n{d9) 


Example, (continued) Because a Gaussian distribution is unimodal and symmetric with 
respect to its mean, Ri-a,T = Xt{oo) — j^^(^) _ st(x^{uj)) m.a.e., 

where Ua /2 denotes the a/2 quantile of a standard Gaussian distribution. o 
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Remark 7. The existence of neoclassical confidence region is typically not a concern. 
Appendix on p. 49 proves the existence of neoclassical conhdence regions under mild 
assumptions. o 


By construction, a neoclassical confidence region is an indicator of the confidence we 
can have in a neoclassical estimate. It is the set of parameter values that are the closest 
to being the neoclassical estimate, such that the whole set has a probability at least 1 — a 
to contain the generic hnite-sample proxy 9^ m.a.e. Thus, a small connected neoclassical 
conhdence region indicates a well-separated estimate, which is reliable. In contrast, a 
large neoclassical conhdence region or a neoclassical conhdence region that consists of the 
union of disjoint sets indicates an unreliable estimate. 


Remark 8. If the purpose of conhdence regions is to indicate the conhdence we can have in 
an estimate, their neoclassical dehnition is more satisfactory than their Neyman-Pearson 
dehnition (inadequacies caused by multiple use of the data and their past realization set 
aside). The Neyman-Pearson dehnition of conhdence regions is not about the estimate, 
but about coverage. In particular, Neyman-Pearson conhdence regions do not necessarily 
contain the estimate. o 


Remark 9. In Dehnition]^ the dehnition of neoclassical estimator is formally close to the 
dehnition of Bayesian highest posterior density (HPD) sets (e.g., Berger, 1980/2006, sec. 
4.3.2., Dehnition 5), although their theoretical justihcation and meaning are fundamen¬ 
tally diherent. o 

Although Dehnition corresponds to a joint conhdence region, marginal and condi¬ 
tional neoclassical conhdence regions can also be dehned by considering the marginal and 
conditional distribution of 9^. From neoclassical conhdence regions, we dehne neoclassical 
tests. 

Definition 6 (Neoclassical test). Let B. : 9o = 9 be a test hypothesis, and Ri-a,T al — a 
neoclassical confidence region, where a G [0,1]. As in Definition^ denote the decision 
space with D ;= {dii,dA}- A neoclassical test of level a for B is a decision rule, denoted 
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dx, S.t. if 


6 G Rl-a,T 


m.a.e. 


then dx = dn; otherwise dx = d^- 


Definition leads the econometrician to reject hypotheses that do not correspond to 
the set of parameter values with the highest probability density of being equal to the 
generic proxy 9^ m.a.e. By Dehnitions]^ all elements in a neoclassical conhdence region 
have a higher probability density of being equal to the generic hnite-sample proxy than 
the ones outside it m.a.e. 


Example, (continued) If 6* e Xx{oj) — ^ .^g 

do not reject the test hypothesis, i.e., dx = dn- Note that, in this example, the neo¬ 
classical estimate, conhdence region and test are practically equivalent to their usual 
Neyman-Pearson counterparts, although their theoretical justihcation is different. Nev¬ 
ertheless, there are Neyman-Pearson conhdence regions and tests that do not practically 
correspond to neoclassical conhdence regions or tests. E.g. under the assumption that 
data Xi-x have not been realized prior to the decision to compute the conhdence inter- 


ST(Xi.T(i^)) 


val 


—oo, Xx{uj) — 


sriXi-Tiui)) 

VT 


U.b+i 


U 


Xx(u) - 


Vt 


M.5-f, oo 


, the latter is a valid 


1 — a Neyman-Pearson conhdence region, while it is not a neoclassical conhdence region. 

o 


Remark 10. While the neoclassical dehnition of conhdence regions appears more satisfac¬ 
tory than their Neyman-Pearson dehnition (see Remark]^ p. [Tt] ), the reverse seems to be 
true for tests (inadequacies caused by multiple use of the data and their past realization 
set aside). Unlike Neyman-Pearson tests, neoclassical tests do not directly control the 
probability of making an error, so that their outcome should be understood in terms of 
evidence in favor of or against the hypothesis. However, it should be noted that Neyman- 
Pearson tests control the probability of type I error only ex ante: after computation of the 
test statistic, the probability of error is 0 or 1. Moreover, work in progress by the authors 
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suggest that the direct control of type I error can be regained within the neoclassical 
theory. o 


Remark 11. When /i = A, the precise choice of fg» is typically crucial for Definitions 


016^ : a modifications of the p.d.f. fe- on a A-null set yields another p.d.f. of P o 0* 


-1 


w.r.t. A m.a.e. that can lead to a completely different estimate, confidence region and 


result of a test (see subsection 5.2). This peculiarity, from which we take advantage 


in subsection 5.2 (see Remark 17, p. |3l| ), also arises in Neyman-Pearson and Bayesian 
theories (e.g., Gourieroux and Monfort, 1989/1996, sec. 7.A.2). Nevertheless, under the 
mild assumption that 0 G © is a Lebesgue point, by Lebesgue’s differentiation theorem 
(e.g., Folland, 1984/1999, Theorem 3.21), fe^{9) is often s.t. /e*(6') = lim^^o 
m.a.e., where Br{0) denotes a ball in © centered at 6 with radius r > 0. o 


Remark 12. As Definitions ii and respectively indicate, neoclassical estimators, con¬ 
fidence regions and tests are not random m.a.e., and do not depend on the realized data 
m.a.e. In the examples, their dependence on the realized data is only due to the approx¬ 
imation error. o 


4.3. Neoclassical theory and multiple use of the same data. The upcoming The¬ 
orem investigates the adequacy of the neoclassical theory when data have already been 
used, and thus are known. Because the neoclassical theory is based on the distribution of 
the generic proxy 6*^, it is sufficient to investigate the effect of previous knowledge of the 
realized data on this distribution. 

Theorem 3 (Neoclassical adequacy). Under Assumptions^ and[^ for all B G £&, 

i) for all At G 5^., {Xi:^ G At} and {9^ G B} are independent m.a.e., i.e., 

P {{9t G R} n {Xi,T G At}) = P(0* G B)F{Xi,t G At) m.a.e.; 

ii) for all At G ^t £ ^t) > 0, 


P(0^ G B\Xtt e At) = P(0’ G B) m.a.e. 
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Proof. It is a consequence of Lemma i) By Lemma 9^ and Xi-x are independent, 
so that all events in a(6'^) and a{Xi,T) are independent (e.g., Kallenberg, 1997/2002, p. 
50). ii) Using (i), replace in the proof of Theoremthe nonequal sign by an equal sign, 
and set E = {6^ G B}. □ 

Theorem shows that the distribution of the generic proxy 6^ is immune to previous 
knowledge (or realization) of the data. Then, the inadequacy of neoclassical confidence 
regions and tests follow. 


Corollary 1 (Neoclassical confidence region and test adequacy). Let Ri-a,T be a neo¬ 
classical 1 — a confidence region for Oq. Under Assumptions^ and[^ for all At G S_t s.t. 
P(^l:J" G At^ > 0, 

P(0^ G Ri-ap'\Xi.T G At^ = P(0y G Ri—agfi m.a.e. 

Proof. Apply Theorem l^ii) putting B = Ri_a,T- D 


Example, (continued) For clarity, we now explicitly distinguish between the fixed a; G f2 
due to the approximation error and the random elements of the sample space. We denote 
the latter ones with a). By Theorem]^ (i), m.a.e.. 


P {(h G r2 : 9*rp{uj) G i?| a; G f2 : Xi.,t{u:) G At} 

= P(0^ G Br{e)\XTT G At) = P(0r G S) = P{c5 G : 0’(c5) G B] 


[sT{XT.T{^))/^/f]^ 


exp 


' B 


1 / e-XT^oj) Y 

'2 \st{XtA^))/^) 


A(d0) 


By Corollary!^ m.a.e.. 


P <a;G Ut-.6UCj)e 


^ I ^ 'ST(Wi,T(a;)) - ST(Wi,T(a;)) 

Xt[PJ) - 1= -A-t(i^)-^ 


Vt 


Vt 


ZjELpt'.X\.T{pj)^A'^ 


— P(6'^ G Rl-a,T\Xi.,T e At) — P(6** G Rl-a,T 
= P I a) G Li : 6t{oj) G 


— ST(Wi;r(a;)) — ST(Wi:T(a;)) 

XtKoj) - -j=- - Xt\PJ) -^-Me 


Vf 


Vf 
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The distinction between the hxed u and the varying u is essential. The hxed u can be 
ignored as long as the approximation error is negligible, i.e., the approximation is jnstihed. 
o 


Remark 13. Corollary [^ii) does not mean or imply that, for all G Srp, lim 7 ’_^oo G 
Ri-a,T\Xi:T G At) = liniT^-oo^( 6 ^* G R\-a,T)- First, in the neoclassical theory, approx¬ 
imation errors do not necessarily come from asymptotic approximation (see Remark]^. 
Second, even in the case, in which the whole approximation error wonld come from an 
asymptotic approximation. Corollary [^ii) only implies independence between Xi-oo and 
any neoclassical conhdence region Ri-a dednced from lim^’^ooP ° where P o 
denotes an approximation of the distribntion of the generic proxy 9^. E.g., in the Exam¬ 
ple, P o 9 ^~^ ~ J\f (^Xt{uj), j ^ so that hmT_,.oo P ° = 60^ P-a.s., which, in tnrn, 

implies, Ri-a,oo = { 6 * 0 } P-a.s. Therefore, {a) G f2 : 6^0 G Ri-a,oo} = {cD G f2 ; 6^0 G { 6 * 0 }} 
has probability one, and is independent from the data Xi:oo. Note that, in the Neyman- 
Pearson theory, we would need to consider hmT_^oo P ^ ^0 ^ 


1 — a, where 


_ stXi-.t) „ y _ ST(Ni;-r) 


2 


Xt -- 

depends on the data, and is random 


-T--^ 

even asymptotically. This difference between the two theories should help to understand 
why, in the Example, the same hnite-sample conhdence interval depends on the data 
m.a.e. for the Neyman-Pearson theory, while it is does not depend on the data m.a.e. for 
the neoclassical theory. o 


Remark 14. In this paper, the generic proxy 9^ is introduced for expository purpose, 
i.e., to allow the use of probability symbolism. From a strict logical point of view, the 
immunity of the unconditional distribution of 9^ to multiple use of the same data is all 
that is needed for the neoclassical adequacy: by dehnition the unconditional distribution 
of 9^ is about all the possible values of 9t induced by all the possible samples that could 
have been observed. In other words, the key difference between the Neyman-Pearson and 
Bayesian theories on the one hand, and the neoclassical theory on the other hand is that, 
in the latter, inference exclusively relies on a unconditional distribution m.a.e., while, in 
the other, inference relies on the realized data, even m.a.e. o 
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The probabilistic statements, on which neoclassical estimators, conhdence regions and 
tests are based, are immune to previous information about the data, and thus to multiple 
use of the same data m.a.e. To our knowledge, the neoclassical theory is the first general 
inference theory immune to multiple use of the same data m.a.e. 


5. A NEOCLASSICAL POINT OF VIEW ON SOME CALIBRATION AND ECONOMETRIC 

PRACTICES. 

This section aims at presenting some prominent practices from the point of view of 
the neoclassical inference theory. The elementary version of the theory outlined in the 


subsection |4.2| is sufficient for this purpose. By-products of the current section are ex¬ 
amples of implementation of the neoclassical theory, novel theoretical justihcations for 
the presented calibration and econometric practices, and a standard-error adjustment to 
account for approximation errors. 


Subsection 5.1 discusses requirements for proxies and approximations of their distribu¬ 


tion. Subsection 5.2 presents choices of proxies and of approximations that correspond to 


different econometric and calibration practices. Subsection 5.3 assesses the most common 
econometric practice through Monte-Carlo simulations, and presents the standard-error 
adjustment. Because, in this section, we discuss the choice of approximations, we distin¬ 
guish between the distribution of the generic proxy and its chosen approximation, 

which we denote P o = f fg^{9)fi{d9). 

5.1. On generic proxies and approximations of their distribntion. An implemen¬ 
tation of the neoclassical theory requires two inputs: a generic proxy and an approximation 
of its distribution. These inputs do not have to satisfy any particular criteria other than 
being considered a proxy of 9q, and an approximation of P o 9^~^, respectively. In partic¬ 
ular, the neoclassical theory does not require consistency of any of the two: consistency is 
about situations where the number of observations can be infinitely increased, while prac¬ 
tice is necessarily based on a bounded number of them. Nevertheless, hereafter, except 


in the subsection 5.2.1 about calibration, we focus on asymptotically normal proxies and 


consistent approximations, so that we can rely on insights from the asymptotic theory: 
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the proxy typically corresponds to what is called an estimator in the Bayesian or Neyman- 
Pearson theory. The following Assumption requires asymptotic normality of 6^, which 
is a property of most estimators considered in the Neyman-Pearson and Bayesian theories 
(e.g., Chernozhukov and Hong, 2003). 

Assumption 6 (Asymptotic normality of 6^). The generic proxy of 9 q is asymptotically 
normal, i.e., (by Assumption ^c)) there exist a random variable and a sequence of 
random variables on (fijTfj) s.t. 

0 * = 00 + ^ 1/2 + 

where P o ~ A/'(0, and as T ^ oo. 

Assumption means that the generic proxy 6^ asymptotically converges to 0o as a 
Gaussian random variable centered at 0o with a standard deviation that goes to zero at 
rate \/T. We could weaken Assumption]^ to allow rates of convergence different from a/T, 
or to allow different distributions for (e.g., Dickey-Fuller distributions), but it would 
complicate the presentation. The following Assumption requires the approximation of 
the distribution of the generic proxy to be consistent. 

Assumption 7 (Consistency of Po0^~^). The approximation of the distribution of the 
generic proxy is consistent, i.e., as T —)■ oo, 

po 4 0 , 

where p(.,.) denotes a metric on the space of probability measures on (©,T©). 

Assumption means that the distribution of the generic proxy and its approximation 
converge to each other as the number of observations increases. In Appendix we 
verify Assumption for the approximations considered in this paper. In practice, the 
distribution of the generic proxy, P o 9^~^, is typically unknown, so that Assumption]^ 
cannot be directly verihed. Nevertheless, the asymptotic limit of Po is often known, 
so that the following lemma provides a usable criterion for checking Assumption 
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Lemma 2. Under y4ss'um]:>izons[^ and|^ if there exists a probability measure ^ on 

(©,£^©) s.t., as T ^ oo, 

(a) p ^P o P o j A 0 and 

(b) p(Po0^-i,po0--i)4o, 

then P o is a consistent approximation o/P o , i.e., ylss?/mptzon[^ holds. 

Proof. Triangle inequality yields p ^P o 0^“^, P o 6 ** j ^ p(P o 9!f~^, P o 0^“^) + p(P o 
P o 9^~^), where the two terms of the RHS go to zero in probability as T —)• oo by 
(a) and (b), respectively. □ 


Example, (continued) Let p be the Prokhorov metric on the space of probability mea¬ 
sures on (0,T©). The Prokhorov metric generates the topology of the convergence in 
law (e.g., Billingsley, 1968/1999, pp. 72-73), which, in turn, corresponds to the point- 
wise convergence of cumulative distribution functions (c.d.f.) at continuity points of the 
limiting c.d.f. (Portmanteau theorem). Denote the c.d.f. of the Gaussian distribution 
A7(r, s) with 91(.;r;s). Then, for all 0 G 0 \ { 6 * 0 }, limr^oo 5^(6*; ^o; ^) = l[0o,oo[(6'), be¬ 
cause 91(6*; 6 * 0 ; ;^) = 91(a/T^^; 0 ; 1 ), and lim^^oo = — 00 , if 6 * < 6 * 0 , and 00 

otherwise. Similarly, for all 6 * G 0 \ { 6 * 0 }, limT^-oo 91(6*; = l[ 0 ^ g^[( 6 *), and 

limT^-oo 1 [Xt,oo[(^) = l[eo,oo[( 6 ') P-a.s. (see also Appendix F.2.1, Proposition on p. 61) 


6y(Xi.y(a;)) 


Thus, by Lemma ^XtU) and M (^Xt(u;), ^ 

the distribution of the generic proxy X^, A/'(6*o, -^). 


are consistent approximations of 

o 


Remark 15. Unlike for Neyman-Pearson and Bayesian theories, implementations of the 
elementary version of the neoclassical inference theory presented in this paper seem to 
always rely on an approximation, the approximation of the distribution of the generic 
proxy. This is a disadvantage of the elementary version of the neoclassical theory. How¬ 
ever, hrstly, most of econometric practices rely on approximations: implementation of 
the Neyman-Pearson and Bayesian theories typically requires asymptotic (e.g. CLT) or 
computational (e.g., Markov Chain Monte Carlo algorithms) approximations. Secondly, 
in some particular cases, we can bound the approximation error in probability, or even 
derive the exact distribution of P o 6*^“^, and of p(Po6*^“^,P o 6*^“^) : see Appendix 
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on p. 


51 Thirdly, in practice, there is a trade-off between approximations and the 


approximative assnmptions that are needed to avoid the (explicit) approximations. 


5.2. Some practices in neoclassical terms. In this snbsection, we frame some cali¬ 
bration and econometric practices in terms of the neoclassical theory, so that they are 
provided with a theoretical fonndation immnne to mnltiple use of the same data m.a.e. 
For convenience and brevity, we present the practices according to the kind of approxi¬ 
mations of P o they rely on. Thus, practices that combine different approximations 
(e.g., mix of calibration and econometrics in Canova, 2007, chap. 7) are only indirectly 
treated. Appendix [^studies the asymptotic limit of the approximations considered in this 
subsection, while Table in Appendix [G] on p. provides a panoramic view of them. 


5.2.1. Model-calibration approximations. By calibration, we mean the more or less for¬ 
mal process through which the parameter values of a model are selected in view of data. 
In finance and economics, this process often correspond to the minimization of some 
goodness-of-fit measure, or to the choice of estimates from various existing empirical 
studies. While model calibration is used in many fields (e.g., Oreskes, Shrader-Frechette 
and Belitz, 1994), it has become common in economics and finance with the development 
of general-equilibrium models (Johansen, 1960; Kydland and Prescott, 1982; Shoven and 
Walley, 1984) and derivatives pricing, respectively. We distinguish two types of calibra¬ 
tion: plain calibration and criterion-adjusted calibration. 

Plain calibration. In plain calibration, the selected parameter values are just plugged in 
the model in lieu of the unknown parameter 6q. No information about potential calibration 
error or model-specihcation error is reported. Plain calibration is widely used to price 
derivatives in finance (e.g., Cont, 2010, p. 1217). If the selected parameter values are 
assumed to be a realization of a random variable under Assumptions and plain 
calibration can be regarded as an implementation of the neoclassical theory s.t. 

• the generic proxy is a random variable that has the same unconditional dis¬ 
tribution as i.e., 9^q = where denotes the inverse of the 
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unconditional c.d.f. of 0^^, and Uq a random variable uniformly distributed on 
[0,1] that is independent from the data; 

• the approximation of the unconditional distribution of the proxy is the unit point 
mass at the selected parameter value 9^q{u;), i.e., for all 6* G 0 , 

= 1 ( 05 ,c,(a;)}( 6 ') with /i = I/. 

Then, by Dehnitions and both the neoclassical estimate and conhdence region cor¬ 
respond to the calibrated value, i.e., Ot = and Ri-a,T = m.a.e. If the 

selected parameter value is close to 6*o (e.g., the proxy 9^^^ is consistent and the number of 
observations T is large as in Proposition [^i) in Appendix |F.1[ ), plain calibration may be 
sufficient. However, in economics and hnance, this is not often the case, so that indication 
of the potential calibration error and model-specihcation error are often needed. 

Criterion-adjusted calibration. By criterion-adjusted calibration, we mean a plain cal¬ 
ibration accompanied by indications of calibration error and model-specihcation error 
based on nonstatistical criteria. The indication of calibration error comes from the deter¬ 
mination of a range of plausible values for the model parameter. The indication of model- 
specihcation error comes from the computation of measures of discrepancy between the 
calibrated model and the data (e.g., diherence between moments of the calibrated model 
and moments of the data). 

To cast criterion-adjusted calibration as an implementation of the neoclassical theory, 
it is useful to introduce new notation. We denote the model parameter and the measures 
of discrepancy with (3 and A, respectively. We also denote the selected value for jd with 
and the measure of discrepancy implied by with The parameter jd of 

the model of interest are not to be confused with the global parameter 9 := {fd' A')' 
. The determination of the range of plausible values for fd and acceptable values for A 
can be formalized by a positive criterion function u : {9, 9) h->• u{9, 9), which indicates the 
adequacy between its two arguments, and which equal zero outside 0^. The criterion 
function u is maximized when its two arguments are equal, i.e., for all 6^ G 0 and 9 G 
0\{6*}, u{9, 9) ^ u{9, 9). With this notation, and under the assumption that the following 
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quantities exist, criterion-adjusted calibration can be regarded as an implementation of 
the neoclassical theory s.t. 

• the generic proxy is := F~^ 1 ^. (f/’c-), where ^ is a conditional 

c.d.f. s.t. := with ]-oo,0] :=]-oo,9,]x]- 

cx), 6 ^ 2 ] X ■ • ■ x] — cx), 9p\ for ( 6*1 6*2 6*3 ■ ■ ■ 6 *p)' := 9 E &, and where Uqq is a random 
variable uniformly distributed on [ 0 , 1 ] that is independent from the data and 9^q] 

• the approximation of the distribution of the generic proxy 9^ is the normalized 
criterion function, i.e., for all 9 E &, 

fe‘ (9) oc u{9, 9*rp(j{^)) with /x = A. 


5.2.2. Gaussian approximation. The Gaussian approximation is one of the most-widely 
used approximations. Under the assumption that the unconditional distribution of the 
t-statistic y/TTirp^{9^Q — 6 * 0 ) converges asymptotically to a standard Gaussian A/’(0,/), 
econometricians typically deduce univariate conhdence regions and sets of nonrejected 
point hypotheses, , where 9\q ^^, #,G,fcA 

Uq,/ 2 , respectively, denote the k-th. element of the random vector 9^q.i the k-th. diagonal 
element of the matrix S-p, and the q ;/2 quantile of a standard univariate Gaussian A/'(0,1). 
As in the Example of this paper, such practice can be regarded as an implementation of 
the neoclassical theory s.t. 


• the generic proxy 9^q is a. random variable that has the same unconditional dis¬ 
tribution as i.e., 9^q = Fg^iJjQ) where Fg^^ denotes the inverse of the 

unconditional c.d.f. of 0 ^^', and Uq a random variable uniformly distributed on 
[ 0 , 1 ] that is independent from the data; 

• the approximation of the unconditional distribution of the proxy is the Gauss¬ 
ian distribution centered at 9^q{uj) with variance-covariance matrix the diagonal 
matrix of the diagonal elements of St, i-e., for all 9 E &, 


fe‘^{9)ocexp -^(0 - G(^))'diag(ST(a;)) ^(^-^r,G(^)) 


( 4 ) 
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with /i = A, and where diag(ST(iA')) is the diagonal matrix that has the same 
diagonal elements as 

In the Appendix we show that the approximation Q is consistent under Assumptions 
as and 1^ (see Proposition in Appendix |F.2[ ) 

5.2.3. Laplace approximation and “Bayesian” practice. In most econometric practices, 
the unknown parameter 6q is approximated by a maximizer of a converging objective 
function Qt{Xi,t,9), i.e., 


91^ r e ajLgmaxQT(Xi.T,9) (5) 

where, as T —)■ cx), supgg 0 ||(5r(-Ai:T, 6 ^) — Q{9)\\ = op(l) with 9q = argmax^g© Q( 6 *). 
See Bierens (1981), Amemiya (1985, chap. 4), Gallant and White (1988), Newey and 
McFadden (1994), and Potscher and Prucha (1997), all of which follow from earlier con¬ 
tributions by Wald (1950), Malinvaud (1964/1970, chap. 9; 1970), and Jennrich (1969). 
If TQt{Xi:t, 9) is a log-likelihood Lt{Xi,t^ 9), for all xi:t G and 9 E &, the function 

(a;i:r, 0) ^ (6) 


is numerically equal to the Bayesian distribution of the data Xi-^t conditional on 9o, 


denoted 7rxi.T|eo(^i:T|6') (see subsection 3.1). Thus, by analogy, often motivated by the 
Laplace approximation, several papers (e.g., Zellner, 1997; Kim, 2002; Yin, 2009) have 
used the function 


(Xl:T, 9) ^ 


( 7 ) 


in lieu of 7 rxi.j.| 6 »o(-l -)5 even when the former is not numerically equal to the latter. Then, 
they consider that the Bayesian posterior distribution Tieo\Xi.T (^|7^i;r(iA')) equals 


QTQriXi.Tiuj) ,9) yj 


( 8 ) 


where w : & ^ R+ is a function that they regard as the Bayesian prior ngg{.). As 
previously noticed (e.g., Chernozhukov and Hong, 2003), such a practice is not in line 
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with Bayesian theory, even if we set aside the invalidation dne to mnltiple nse of the 
same data: approximations are not compatible with Bayesian inference theory, which 
reqnires an econometrician to know exactly the distribntion of the data conditional on 
trne parameter (Savage, 1954/1972, pp. 59-60). However, snch a practice is in line 
with the neoclassical inference theory whether the fnnction ([^ is nnmerically eqnal to 
oi' not. 


Under general assnmptions, a strand of literatnre that goes back at least to Laplace 
(1774/1878) has shown that ([^ is a consistent approximation of the nnconditional distri¬ 
bntion of See literature on consistency of Bayesian posteriors (e.g., Doob, 1949), 

and the Bernstein-von Mises theorem (e.g., Le Cam, 1953, 1958; Chen, 1985; Kim, 1998; 


Chernozhukov and Hong, 2003), which implies consistency under P (see Appendix F.2.3 


on p. 62). We distinguish three types of Laplace approximations: plain Laplace ap¬ 
proximation, weighted Laplace approximation, and criterion-adjusted weighted Laplace 
approximation. For brevity, we do not treat the plain Laplace approximation separately 
from the weighted Laplace approximation, as the former is a particular case of the latter 
with w{6) = 1, for all 6* G 0. 

Plain and weighted Laplace approximation. The neoclassical theory justifies practices 
that treats ([^ as a Bayesian posterior, and report the counterpart of a mode and of an 
HPD region. Such practices can be seen as an implementation of the neoclassical theory 


s.t. 


• the generic proxy O^wl ^ random variable that has the same unconditional 

distribution as := argmax^g©i.e., 6t,wl — 

where denotes the inverse of the unconditional c.d.f. of ^wl ^ 

random variable uniformly distributed on [0,1] that is independent from the data; 

• the approximation of the unconditional distribution of the proxy is the expression 
([^ viewed as a function of 6, i.e., for all 6^ G ©, 
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From a neoclassical point of view, w{.) weights the evidence from data. Mathematically, 
it corresponds to a change of measure from the plain Laplace approximation, P o 
to the weighted Laplace approximation, P o i.e., for all 0 G ©, 

d(Po^^-1) 

w{e) = ^ (9). 

d(P o ) 

The weighting function w{.) allows to incorporate additional information in the proxy of 
Oq. While, in Bayesian theory, the dependance of a prior on data is typically problematic, 
in the neoclassical theory, the weighting function w{.) can depend on the data. The 
neoclassical theory only requires dT,WL the integral of ([^ to be considered a proxy of 
00 , and an approximation of Po0^^^“^, respectively. Thus, in particular, the neoclassical 
theory provides a theoretical foundation to the practice called parametric empirical Bayes 
(Morris, 1983). Petrone, Rousseau, Scricciolo (2014) present conditions under which the 
weighted Laplace approximation (j^ is a consistent approximation of P o 0 * when 

w{.) depends on data through an estimated hyperparameter. 

Criterion-adjusted weighted Laplace approximation. When ([^ is treated as if it was a 
Bayesian posterior, the counterpart of a mode and of an HPD region are not always re¬ 
ported. Instead, the econometrician chooses a utility function (i.e., opposite of a loss 
function), u : (0e,0) t u{9e,0), and then maximizes w.r.t. 9^ the expected utility 
/© w(^e, 9) ^ ^nonnegative (see upcoming Remark 

such a practice can be seen as an implementation of the neoclassical theory s.t. 


16), 


the generic proxy is 9^TCWL ■= where JO := 

fj j — ;W (d0 ), and where U^wl is a random variable uniformly dis- 
tributed on [0,1] that is independent from the data and 0 ^ vkl! 
the approximation of the distribution of the generic proxy 0rcwL i® expected 
criterion function m, i.e., for all 0 G ©, 


/0.(0) oc / M(0,0)e^®^i^i^^(‘^i’^iw(0)A(d0) with /i = A. 


( 9 ) 
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In Appendix]^ we show that the criterion-adjusted weighted Laplace approximation ([^ 
is consistent under general assumptions. The Bayesian approach and the neoclassical 
approach based on the criterion-adjusted weighted Laplace approximation (§ are fun¬ 
damentally different, although they are numerically equivalent when TQt{Xi:t, d) is a 
log-likelihood (and the data have not been previously used). For Bayesian inference the¬ 
ory, an econometrician faces a known probabilized uncertainty of a random parameter Oq, 
while from a neoclassical perspective an econometrician faces an estimated probabilized 
uncertainty of a random proxy of a constant parameter Oq. Thus, the neoclassical theory 
acknowledges the existence of an “unmeasurable” uncertainty described by Knight (1921, 
chap. 7-8). 


Remark 16. In this paper, we assume nonnegative criterion functions to remain within the 
framework of the elementary version of the neoclassical theory presented in section]^ This 
requirement limits the type of neoclassical conhdence regions considered in this paper. 
Nevertheless, under the mild assumption that q-^^q2u{9,9) > —oo, this requirement 
is without loss of generality for neoclassical point estimation: we can dehne the criterion 
function 'u( 6 'e, 9) = u{9f.^ 9) — inf( 0 g)g 02 u{9^ 9) which yields the same point estimate. o 


Remark 17. As explained in the introduction of this subsection, our presentation does 
not present hybrid practices, so that we do not explicitly cover the diversity of the econo¬ 
metric practices labelled Bayesian. However, they appear to also be implementations of 
the elementary version of the neoclassical theory presented in the subsection |4.2 For 
example, reporting the HPD region of a Bayesian posterior with its mean can be seen 
as an implementation of the neoclassical theory s.t. the generic proxy is 
approximation of its distribution is 




A* (^) oc 


w{9) 


oo 


if 0 G 0 \ {9t} 
ii 9 = 9 t 


It := IJ ^ 

equal-tailed 68 % conhdence interval of a Bayesian posterior with its mean can be seen 
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as an implementation of the neoclassical theory s.t. the generic proxy is 
approximation of its distribntion is the product of p Gaussian p.d.f. centered at the 
midpoint of the k-ih interval with a standard deviation approximately equal to half of 
the length of the same interval, i.e., 

I 11^=1 if 6^ G © \ 

fe-^i.0) oc I 

I cx) if 6^ = 6*^ 

with jU = A, and where [a^, hk] denotes the reported intervals, Sk^r ~ a^d n(6'; r; s) 

is a Gaussian p.d.f. with expectation r and standard deviation s. o 


5.3. Assessment of the Gaussian approximation and a standard-error adjust¬ 
ment. From a neoclassical point of view, the issue raised by multiple use of the same data 
boils down to the question of approximation errors. This subsection studies the average 
effect of approximations errors, and develops a standard-error adjustment to account for 
it. For brevity and relevance, we focus on the Gaussian approximation, which corresponds 
to a large part of econometric practice. Assessments of other approximations are left for 
future research. 


The basic algorithm of our Monte-Garlo simulations is the following. 

For m = 1, 2, 3,..., M 

(1) Draw i.i.d. data X^{uj) := 

(2) Compute 

• Oj} [Lo) = Xj, (ca); 

. FlT'iff) = vt (e-, x'p\ui)-, j ; 

• . vhere . Sr”’-W“l ' 

We draw data either from a Gaussian distribution, or from a Bernoulli distribution. 
Both families of distributions are interesting for different reasons. Data from a Bernoulli 
distribution are known to be relatively challenging for Gaussian approximations, especially 


when the Bernoulli parameter is close to 0 or 1 (e.g., Agresti and Goull, 1998; Brown , 
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Table 1. Monte-Carlo assessment of Gaussian approximations 
for M = 10000. RMSE is the square root of the mean-square 

error.“ The columns and sup, respectively, correspond to 

B E".. (»)I^A(d») and i E". |qy>(«) - 


^T-\ T^TUi-^, 0*rp-x Tj^StU^ 



P o 

Xj' 

stXT 

P{S9^ 

) 

¥{e^eRi. 


0) 

P(6 

'^GRi 

-a,T{a} 

')) 

T 

RMSE 

RMSE 


sup 

.68 

.90 

.95 

.99 

.68 

.90 

.95 

.99 

20 

M(0,.2) 

.045'’ 

.007'’ 

.089 

.311 

.5 

.732 

.811 

.912 

.658 

.878 

.932 

.981 

50 


.028 

.003 

.07 

.303 

.51 

.746 

.825 

.924 

.671 

.892 

.943 

.987 

100 


.02 

.001 

.058 

.299 

.515 

.752 

.831 

.929 

.677 

.897 

.948 

.989 

20 

Af(0,.4) 

.09 

.015 

.126 

.311 

.5 

.732 

.811 

.912 

.658 

.878 

.932 

.981 

50 


.057 

.006 

.099 

.303 

.51 

.746 

.825 

.924 

.671 

.892 

.943 

.987 

100 


.04 

.003 

.082 

.299 

.515 

.752 

.831 

.929 

.677 

.897 

.948 

.989 

20 


.056 

.043 

.117 

.37 

.345 

.638 

.677 

.726 

.568 

.725 

.728 

.74 

50 


.035 

.027 

.083 

.405 

.48 

.74 

.794 

.885 

.632 

.87 

.89 

.937 

100 


.025 

.019 

.067 

.374 

.501 

.741 

.824 

.918 

.683 

.881 

.929 

.971 

20 

H(.5) 

.113 

.053 

.143 

.375 

.566 

.729 

.838 

.911 

.686 

.88 

.931 

.981 

50 


.071 

.035 

.111 

.346 

.515 

.728 

.806 

.922 

.631'^ 

.898 

.942 

.987 

100 


.05 

.025 

.093 

.332 

.474 

.768 

.82 

.922 

.682 

.895 

.943 

.988 


“ We do no report the bias as we know that Xt and are unbiased (e.g., Gourieroux and Monfort, 1989/1996, 

Example 6.4). ^ RMSE(st/v^)=RMSE(st)/VT, which elucidates why RMSE(Xt) < RMSE(st/v^). ^ As in the 
Gaussian case, this Bernoulli parameter is chosen to halves the standard deviation of the other Bernoulli, i.e., 

_ ^ ^The non-monotonic convergence to .68 is due to the discontinuities induced by the Bernoulli 
distribution. See for example Brown, Cai and DasGupta (2002) for a similar phenomenon. 


Cai and DasGupta, 2002 and references therein). Data from a Gaussian distribution 
neutralizes the part of the approximation error coming from the distribution family: the 
average of Gaussian random variables is also a Gaussian random variable. 

Table [T] shows that, in both cases, Gaussian approximations globally converge in terms 
of the first two moments, of the norm, and of the sup norm. However, the probability 
of neoclassical confidence regions to contain the generic proxy appears downward biased. 
Proposition [T] formalizes this downward bias, and proposes an asymptotic adjustment for 
it. 


Proposition 1 (Downward bias and standard-error adjustment). Under Assumptions^ 

/V p 

4: and 6, ifT^T T. as T ^ oo, then, for all £t C a{Xi,T) and k G [l,p], 


T^oo 


i) lim E<!P ( 0* G 


n* ^T,k,k ^T,k,k 


Vt 


Vt 


Si 


< 1—0 
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Figure 1. Examples of approximation errors for P o ^ ~ ■A/'(0, .4) and 
T = 20. The solid black line is P o ^ while the others are realizations of 

_ .^(m) 

its Gaussian approximation Vertical dashed lines correspond 

to the 95% neoclassical confidence region from P o 6^~^. 




-0.4 0.0 0.2 0.4 

0 


11 ) ^im E <( P( 0* 


^T,k-\l ) ^T,k-\ 7f,^T,k,kUh 



= 1 — 0 ! 


where s'^kk Ua/ 2 , respectively, denote the k-th element of the random vector 6lf, 
the k-th diagonal element of the matrix St? and the a/2 quantile of a standard univariate 
Gaussian A/'(0,1). 
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Proof. It is an immediate consequence of the asymptotic normality of For all rj > 0, 
by iterated conditioning, 


T—)-oo 


lim E < P I 0^1^ E 


n* VST,k,k n* VST,k,k 

^T,k ^ “l-f ’ ^T,k ^ “f 


^7 


= lim F { 9^^ E 


T^oo 


n* VST,k,k a* VST,k,k 

^T.k 5 ^T,k 


= lim P f ^ -Vf 


T^OO 


V 


Vt 

i^T,k ~ ^T,k) 
ST,k,k 


Vt 


^ Ul-i 


(b) / 1 1 /—i.^0~^Tk) 

= lim P ^ -y/f^ - - + -Vf—^ -^ 


T—)-oo 


V 


ST,k,k 


V 


ST,k,k 


^ Ml_a 


= P f M| ^ A/'(0, ^ 


P (m| ^ A/'(0, y/2) ^ '^ 1 -f) < 1 — a if r; = 1, so that it yields i); 

P (m| ^ A/'(0,1) ^ "Wi-f) = 1 — a if r; = \/2, so that it yields ii). 


(a) On one hand, 9^u ^ 9tu — ^ ^ other hand, 

'/ T l,k ^ l,k VT 2 2^7]'' ST,k.k ’ 

similarly, ^ 6'*j, yy ^ Mi_“ . (h) Add and subtract 6q. (c) 


' ' rj " sx^k.k 

Under Assumption]^ by the continuous mapping theorem (e.g., Kallenberg, 1997/2002, 
Lemma 4.3), as T —)■ cx), A Al-|_ where ~ A7(0, Sk k) 

' ' ’ r? ST,k,k 7? L> ir nSh t. 775^ t ^ ‘ 


ST,k,k 


r)Sk,k r]Sk,k ' 


is independent from 


□ 


Proposition [^) shows that the downward bias holds under general assumptions, inde¬ 
pendently of the sub-sigma algebra of the data we condition on. Figure]^ (p. illustrates 
the reason of this downward bias: the Gaussian approximation does not account for the 
fact that its mean and standard deviation are not known, but estimated, so that there 
is an approximation error. Proposition j^i) shows that multiplying the standard error by 
\/2 asymptotically accounts for the average approximation error, independently of the 
sub-sigma algebra of the data we condition on. The RHS columns of Table [T] suggest that 
this asymptotic adjustment is effective in finite sample. The proof of Proposition for¬ 
malizes the rationale behind the adjustment: asymptotically, after centering and scaling 
by a/T, the average approximation error exactly corresponds to the uncertainty about 9^, 
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Figure 2. Relation between nominal level and adjusted nominal level of a test. 



Nominal level 


so that the variance is doubled by independence, which means that the standard error is 
multiplied by a/2. 

As can be seen from Figure]^ (p. [M| ), the adjustment has a nonlinear effect on conhdence 
region and test levels. The adjustment has a stronger effect in the tails because Gaussian 
distributions are exponentially decreasing in the tails. Tables ii and 1^ are conversion 
tables that documents the effect of the adjustment at conventional levels. They should 
be of special interest to applied econometricians. Table shows that tests at nominal 
levels .01, .05 and .1 are tests at approximate adjusted nominal levels .069, .166 and .245, 
respectively. Conversely, Table shows that tests at adjusted nominal level .01, .05, and 
.1 respectively requires the non-adjusted p-values computed by standard software to be 
approximately below .027(10“^), .056(10“^), and .020 for rejection of the test hypothesis. 
Table 1^ shows the effect of the adjustment on critical values for the non-adjusted f-values 
computed by standard software. Tables Hi and 1^ shed a new light on results published 
in nonexperimental helds. In particular, in view of the data collected by Brodeur, Le, 
Sangnier and Zylberberg (2013, Figure I), the adjustment appears to affect the signihcance 
at conventional levels of many results in the economic literature. 
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Table 2. From nominal levels to adjusted nominal levels. 


Nominal Level .01 .05 .1 .32 

Adj. nominal level (appr.) .069 .166 .245 .482 

Adj. and appr. respectively stand for adjusted and approximation. 


Table 3. From adjusted nominal levels to nominal levels, and adjusted 
critical values for non-adjusted t-statistics. 


Adj. nominal level 

.01 

.05 .1 

.32 

Nominal Level (appr.) 

.027(10“^) 

.056(10“^) .020 

.16 

Adj. and appr. respectively stand for adjusted and approximation. 


Table 4. From non-adjusted critical values to adjusted critical values for 
non-adjusted t-statistics. 


Non-adjusted critical values 2.58 1.96 1.64 .99 
Adj. critical values (appr.) 3.64 2.77 2.33 1.41 

Adj. and appr. respectively stand for adjusted and approximation. 


6. Conclusion 

In nonexperimental fields, it is inescapable to compute test statistics and confidence 
regions that are not probabilistically independent from previously examined data. It 
has been known for decades that Neyman-Pearson and Bayesian inference theories are 
inadequate for such a practice. This paper recalls these inadequacies, and formally shows 
that they also hold m.a.e. A novel inadequacy of the Neyman-Pearson theory for past- 
realized data is also established. Then, a general inference theory compatible with multiple 
use of the same data m.a.e. is outlined. We call it the neoclassical inference theory. 

The starting point of the neoclassical theory is the acknowledgement that econometric 
inference relies on the use of a sample counterpart of the unknown parameter Oq as a 
proxy for the latter one. Then, the idea is to base inference on an approximation of 
the unconditional distribution of the proxy. By dehnition, the unconditional distribution 
of the proxy is about all the possible values of the proxy induced by all the possible 
samples that could have been observed. Thus, neoclassical inference does not depend on 
the realized data m.a.e. Therefore, if we set aside approximation errors, the neoclassical 
theory explains why econometric inference can rely on multiple use of the same data. 
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The other side of the coin is that, from a neoclassical point of view, the issue raised by 
multiple use of the same data boils down to the question of the approximation errors, which 
is the topic of most of the econometric and statistical literature. Nevertheless, Monte- 
Carlo simulations show that finding accurate approximations is not sufficient. Even when 
the approximation method is known to be accurate, errors can have a consequential effect 
on tests and confidence regions. In particular, we prove that the Gaussian approximation 
yields a downward bias in the probability of neoclassical confidence regions to contain 
the generic proxy. Thus, we derive a general, but simple asymptotic standard-error ad¬ 
justment to remove this bias. Monte-Carlo simulations suggest that the adjustment is 
effective in hnite sample. However, more work would be needed to study the impact 
of approximation errors in other situations. The authors have work in progress in that 
direction. 

Beyond the question of multiple use of the same data, the neoclassical inference the¬ 
ory is promising. The neoclassical inference theory sheds a new light on foundational 
and methodological debates in statistics, economics and finance (e.g., calibration vs. es¬ 
timation, and Bayesian inference vs. classical inference). The Example and section 
show that the neoclassical theory provides a unifying framework for model calibration 
and several common econometric practices, whether they are labelled Bayesian or a la 
Neyman-Pearson. Moreover, work in progress by the authors indicates that the version 
of the neoclassical developed in this paper is generalizable. 
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Appendix A. Consistency adequacy 

The following proposition shows that multiple use of the same data does not affect the 
consistency of a point estimator. In this Appendix]^ m.a.e. means that we always con¬ 
sider the asymptotic limit to be exact for any sample size T. Nevertheless, for simplicity, 
we also exclude the case a la Hoffmann-Jprgensen (see Wellner and van der Vaart, 1996) 
in which hnite-sample statistics do not need to be measurable. 

Proposition 2 (Consistency adequacy). Let 9 t be an estimator of 6 q, i.e., a measurable 
mapping from (f2,Tf2) to (©,Te), where £& denotes a a-algebra on 0. Under Assumption 

0 

i) if Ot is strongly consistent, then, for all S G {Ai-^ G S} and {Ot = 9^} are 
independent m.a.e., i.e., 

P (^{0^ = 9o} n {Xi,T G = F{9t = 9oMXi.,t G S) m.a.e.; 

ii) %f9T is weakly consistent, then, for all S G and for all neighborhood of 
9o, {Xi,T G S} and {9 t G are independent m.a.e., i.e., 

P (^{9t G AeJ n {Xi,T G 5}) = F{9t G Ne,MXi,T G S) m.a.e. 

Proof. It is dehnition chasing. By a standard property of probability, for all A, B G Tn, 
P(A U B) = P(A) -I- P(R) — P(A n B). Thus, if P(A) = 1, adding P(A fl R) — 1 on both 
sides yields 


P(A n R) = P(R) = P(A)P(R) 


(10) 
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because 1 = P(74) ^ P(y4 U B) ^ 1. 

i) By definition of strong consistency, P(lim 7 ’_,.oo = (^o) = which means that 


T = Oo) = 1 m.a.e. Then, apply (10) with A = {6t = Oq} and B = {Xi:^ G S}. 
ii) By definition of weak consistency, for all neighborhood of 6*o, limr-^.oo P* (^t G 


Ng^) = 1, which means that F{9t G Ng^) = 1 m.a.e. Then, apply (10) with A = {9t G 
XeJ and B = {Xi,t & S}. □ 


Remark 18. Inspection of the proof shows that consistency is independent of any event, 
i.e., we can replace {Xi,t G 5} by any event G Tfj in the proof, and thus in the 
statement of Proposition o 

Appendix B. Details of the proof of Theorem [U 

In this appendix, we provide a detailed proof of Theorem [T| i.e., we provide details 
regarding the qualification “m.a.e.” We only consider the case a la Hoffmann-Jprgensen, 
as the standard asymptotic case follows easily from it. 

Proof, i) {Xi,T G At} and {6*o G Ci-a,T(Xi:T)} are not independent m.a.e. if, and only 

if, 


liminf P,({ 0 o e n 

T—^■oo 

^ P.>({6'o G <Pl-a,T(Xr)} n {Xi,T 

T^oo P(Xi:j' G At) 


G At}) 7^ liminfP*(0o ^ r(X2’)P(Xi.'7’ G At) 

T^oo 

^ liminfp,(0o e C'i_„,t(X^)) 


4=^ hminfP*(0o ^ C*i_ry t(Xt) |Xi -t G At) 7 ^ liminfP^,(0o ^ C*i_rt t(Xt)) 

T^oo T^oc 

■v 7 P(6'o G Ci_Q,,r(Xi:T)|Xi:7’ G At) 7^ P(6'o G C'i_Q,,r(Xi:T)) m.a.e. 


(a) By assumption, P(Xi,r E At) = c> 0. (b) ^*({eoeCi sup{P(E) : 

E C ({00 e Ci_„,t(X^)} n {Xi:T e At}) a E E 8} = sup{^p?;^ : E c ({00 e 
C,.^,T{2LT)}n{XT.T E At}) A Ee8} = sup{™{g^ : E C ({0o G C,.^AXt)}A 
{Xi,T E At}) a E E 8{Xi.,t E At)} = sup{P(E|Xi:T E S) : E C ({0o G C'i_„,T(Xy)} H 
{Xj^-T G At}) a E E Si^Xi-T E At)} = P*(0o G (Pi_ rv t(Xj')|X i -t E At), where for all 
A G T, 8{A) := {B n A : B E 8}. 
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ii) Replace in the proof of (i), { 6 *o G Ci_a,T{.K.T)}i P* by {dT^Xj.) = dx}i sup, 

and P*, respectively. □ 

Remark 19. In the above proof, the assumption P(Xi:t G At) = c > 0 can be weakened 
to linir^oo P(-^i:r G At) = c > 0. However, the non-existence of the limit or the non¬ 
measurability of the conditioning event {Xi-t G At} would make the proof more difficult. 
In particular, in the latter case, we would need a conditional version of nonadditive outer 
and inner measures, and there does not seem to be a consensus on this subject (e.g.. 
Young and Wang, 1998). These difficulties do not affect the main conclusion of section]^ 
as they are the counterpart of the difficulties to establish the Neyman-Pearson validity of 
a conhdence region or a test. o 

Appendix C. Neyman-Pearson inadequacy for past-realized data 

As pointed out in Remark on p. Theorem can be viewed as a formalization of 
the Neyman-Pearson inadequacy for past-realized data when only part of the data have 
been realized before the determination of the conhdence regions and tests. This appendix 
formalizes this inadequacy in the case in which all data at use have been previously 
realized. For simplicity, we rule out the case a la Hoffmann-Jprgensen in which hnite- 
sample statistics do not need to be measurable. We also require the following assumption 
for the determination of the conhdence intervals. 

Assumption 8. Let P be the probability measure on s.t. P o Xf.}p is the uncon¬ 

ditional physical and unknown distribution of X^t- {ad) There exists a mapping G from 
the space of all probability measures on (fi,T) to the parameter space © s.t. G(P) := 9q. 
(b) There exists a family of probability measures (Pg)e£& on (f2,Ts7) s.t., for all 9 E G, 
9 = (^(Pe), and P is dominated by P^, i.e., P -C Pe- 

Assumptionj^is often satished. Assumption|^a) requires the parameter 9o to depend on 
the underlying probability measure that dehnes the distribution of the data Xi-t- Without 
this assumption, it seems difficult to see how 9q can be inferred from the data X^t- 
Assumption |^b) hrst requires Assumption [^a) to hold independently of the location of 
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6q in ©. It corresponds to the idea that the parameter space © is the set of possible 
values for 6q. Second, Assumption |^b) requires all the measures in the family (Pe) 0 g© 
to dominate the unknown probability measure P. This can be restrictive. Nevertheless, 
Assumption [^b) is often satished, and it is weaker than some assumptions in the literature 
on maximum-likelihood (e.g., Lehmann and Casella, 1983/1998, sec. 6.3; Gourieroux and 
Monfort, 1989/1996, sec. 7.D.) or empirical processes (e.g., Khmaladze, 1981). These 
references require, for all (0, 9) G ©^, the existence of a unique pair of probability measures 
(P 0 ,Pg) s.t. 6 = ^(Pe), 6 = G(Pg), and Pg is equivalent to P^, i.e., Pg ~ P^. We require 
Assumption [^b) to ensure that, for all 9 G Q, Pg-null sets are also P-null sets. 

Under Assumption the following Proposition formalizes the Neyman-Pearson in¬ 
adequacy that arises when all data at use have been previously realized. 

Proposition 3 (Neyman-Pearson inadequacy for past-realized data). i) Let a G [0,1[, 
and Ci-a,T{Xi:T) be a 1 — a pivotal Neyman-Pearson confidence region under 
(P 0 (.|Xi,r))ee©. (i) for all uj e Cl, ^ for oil 9 G 

{xi:T G Sj. : 6* G C'i_a,T(j^i:r)} ^ S_rp! ond (Hi) for all 6* G ©, Pe{9 G 
Gi_Q,,T(-^i:r)|^i:T) ^ 1 — 0 m.a.c. Under Assumptions^and^ ® 

P-a.s. m.a.e. 

ii) Letdr be a Neyman-Pearson test of level a G [0,1[ underF{.\Xi.,T) , he., P(dr(Wi:T) = 
<^a|W:t) ^ o m.a.e., z/H is true. Under Assumption^ z/H is true, dT{Xi,T) = dn 
P-a.s. m.a.e., which implies that there does not exist a Neyman-Pearson test of 
level a G [0,1[ under P(.|Xi.'r) s.t. the probability of type I error is nonnegative 
m.a.e. 

Proof, i) By dehnition of conditional probabilities, for all 6* G ©, P^-a.s., 

Pe{9 G C'i_a_'r(-Ai:'r)|Wi,r) = E0[lc'j_^ j,(Xgj,)(d)|Xi.'r] 

j 1 if 6* G Ci-a,T{Xi:T) 

0 otherwise 
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where the second equality comes from the upcoming Lemma|^). Then, VeiO G Ci-a,r(-^i:T) 
1 — a m.a.e., for all 6 E &, if, and only if, 6 G Ci-a,T{^i:T) Pe-a.s. m.a.e. for all 6 E &. 
Now, by Assumption l^b), for all 6* G ©, P <C Pe- Thus, Ci-a,TiXi.,T) = © P-a.s. m.a.e. 
ii) By definition of conditional probabilities, P-a.s., 


P(dr(-Ai:T) — dA\Xi.,T) — E[l{dy(Xi:T)=(iA}l^l:T] 


^{dT{Xi:T)=dA} 

I 1 if dT{Xi,T) = dx 
|o if dT{Xi,T) = dn 

where the second equality comes from the upcoming Lemma |^i). Now, F{dT{Xi.x) = 
dAl-Ai:^) ^ a m.a.e., if, and only if, driXi^T) = dn P-a.s. m.a.e. Thus, the result 
follows. □ 


Proposition [^i) shows that, when all data at use have been previously realized, only 
Neyman-Pearson tests with zero probability type I error m.a.e. are possible. Although 
it is possible to design such tests, most available tests have a nonnegative probability of 
type I error m.a.e. In addition, the Neyman-Pearson approach to testing is to minimize 
the probability of type II error rather than the probability of type I error (Neyman and 
Pearson, 1933). 

Proposition i) shows that, when all data at use have been previously realized, the 
only possible Neyman-Pearson confidence region is the whole parameter space © P-a.s. 
m.a.e. Such a confidence region is uninformative. The assumptions of Proposition |^) 
that are new w.r.t. Assumptions and are mild. When the unknown parameter 6q 
can be any value inside the parameter space ©, it seems difficult to see how a confidence 
region cannot be pivotal. Thus, Assumption [T] and are often part of the definition of 
confidence regions. E.g., Ferguson, 1967, sec. 5.8; Gourieroux and Monfort, 1989/1996, 
sec. 20. 

The combination of Theorem[^and Propositionj^suggests that Neyman-Pearson theory 
is inadequate for past-realized data. This inadequacy is stronger than the inadequacy for 
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multiple use of the same data, because used data are necessarily a subset of the realized 
data. Moreover, conditioning on the past realization instead of on the knowledge is more 
in line with Neyman-Pearson theory for at least two reasons. First, unlike conditioning on 
knowledge, conditioning on past realizations is not individual-specific, which is a feature 
often presented as an advantage of the Neyman-Pearson theory over Bayesian theory. Sec¬ 
ond, unlike Bayesian theory, the Neyman-Pearson theory distinguishes between unknown 
and random quantities, so that it is difficult to understand why past-realized data, which 
are now fixed, should be regarded as random. 

Remark 20. If randomized test decision rules are allowed. Proposition |^ii) is weaker. In 
this case, if, instead of a level of the test, the probability of type I error is fixed, it can be 
shown that randomized test decision rules dr and the data are independent under H. 
In this paper, we do not consider randomized test decision rules in details for brevity and 
relevance: they are rarely used in econometrics. Moreover, Theorem [T] remains mainly 
unchanged for randomized test decision rules. 

o 

Remark 21. While the Neyman-Pearson inadequacy for multiple use of the same data has 
been mentioned in the literature, the inadequacy for past-realized data is novel to the 
best of our knowledge. o 

Lemma 3. i) Under the notations and assumptions of Proposition^i), for all 9 G 
©, the indicator function {0,1} is (t(Xi:t)/'P({0 , 1})- 

measurable. 

ii) Under the notations and assumptions of Proposition^ii), the indicator function 
kdT(Xi,T{.))=dA} ■ ^ {0)1} a{Xi,T)/V{{0,1})-measurable. 

Proof, i) Let 9 E &. For this proof, define the functions / : Sj^ —)■ {0,1} and h : 
Li ^ {0,1} s.t. /(.) := 1 ci_c,t(.)(^) and h(.) := 1 ci_„,t(Ai:t(.))(^)- Then, ^-^(l) = 
(/ °-^1:t)~^(1) = -T^:t[/~^(1)] ~ ^1:t({^1:T E Sj^ '. 9 E Cl-a,r(3^1:T)}) ^ Cr(-^1:T), where 
the last equality follows the defining property (ii) of confidence regions (see Definition 
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on p. and Assnmption [^c). Now, cr({l}) = ^({0,1}), and inverse mapping preserves 
the set operations that generates a-algebra (e.g., Kallenberg, 1997/2002, p. 3, eq. (1)). 
Thns, the resnlt follows. 

ii) Follow the same reasoning as for (i) with /(.) := l{dTi.)=dA} ^-nd h{.) := l{dT(Xi.T(.))=rfA}- 

□ 


Appendix D. Existence of neoclassical confidence regions 

In this Appendix, we prove the existence of neoclassical conhdence regions nnder mild 
assnmptions. We adapt a proof from Holcblat (2012). For notational convenience, we 
omit the qnalihcation m.a.e., althongh all eqnalities shonld be nnderstood m.a.e. 

Proposition 4 (Existence of neoclassical conhdence regions). Under Assumptions^ 
and[^ for all a E [0,1], there exists a neoclassical confidence region, Ri-a,T- 

Proof. Dehne, for all a E [0,1], 


k := snp [k : F o 6^ ^ ({0 G © : fe^{0) ^ k}) ^ 1 — a} 
fceR ^ 

On the one hand, under Assumptions and |Po9r‘({eee:Aj(9)>0}) =1. 
On the other hand, under Assumptions Hi and by the upcoming Lemma for all 
a E [0,1], there exists k E R+U {cxd} s.t. k ^ k implies Po6*^ ^ E © : /g* (9) > k}) < 
1 — a. Thus, there exists an increasing sequence (^n)„^i with to knt k such that Vn ^ 1, 
P o ({6' : fe:^{6) ^ kn}) ^ 1 - a. Then, by Lemma|^), 

Po0-~i (1^ G © : fe-^{e) ^ fc}) = Jim PO0--1 ({0 G © : /,.(0) ^ ^4) 

^1—0. 


Now, under Assumption 


^ e © : fe-je) ^ k> E T©, so that, by construction, it is a 


neoclassical conhdence region. 


□ 


Lemma 4. Under Assumptions\^^ and[^ 

i) Wk ^ 0, ({0 G © • fe^{d) ^ A:}) is a left-continuous decreasing func¬ 


tion; 
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ii) for alia G [0,1], there exists k G R+U{cxd} s.t. ^ G © ; ^ < 

1 — a. 

Proof, i) Under Assumptionj^ Po^^”^ is probability measure, so that k ha Po^^”^ (|6 * G © 
is decreasing by monotonicity of measures. Prove left-continuity. Let {kn)n^i s.t. kn t 
k G R+ U {c)o}. Then, 

P O -1 ({0 G © ; /,. (0) ^ fc}) = P O -1 I Pi G © ; /,. (0) ^ ^4 

\n^l / 

= hmPo0«-4{0G©;/,40)^fc4) 

where the last equality follows from a standard continuity property of measures under 
Assumption (e.g. in Kallenberg, 1997/2002, p. 8, Lemma 1.14). 
ii) For all real number k > 0, 


'€> 


id) 


© 

lim 1 


lim 

k^oo 


^lim/ 040 )l{^ge:/,. = 0 /i-a.e. 

lim f / 0 -( 0 )l{ 06 ©:/^.( 0 )^fc}(^)h(d 0 ) = [ lim fe-^{0)\ee&-.f,. mk}i^)pW = 0 


k^oo 


(a) By Assumption 1^ fe^{6)fi{d6) = F o 6^ ^(©) = 1. (b) Under Assumption for 

all k ^ 1, l{ 0 g©:/^^( 0 )^fe} ^ k\{ee&:fg.^{d)^k} ^ /e*( 6 ') where /© I/e* ( 6 ')|/i(d 6 ') < cx). Thus 
apply Lebesgue’s dominated convergence theorem, (c) First, by dehnition of Lebesgue’s 
integral, hm^^oo l{eg©./g. (e)^fe}( 6 ') = 0 /i-a.e. (e.g., in Kallenberg, 1997/2002, p. 13, 

Lemma 1.24). Second, Assumption implies that /e^(0) is hnite /i-a.e.: if there existed 
B E S& s.t. /i(R) > 0 and /e* = oo on R, then, by dehnition of Lebesgue’s integral, 
oo = fg^{6)fi{d6) ^ fg^(d)fj.(dd) = Po 6 '^“^(©) = 1 . (d) Under Assumptionj^ for 

all fc G R, |/e^(^)l{eg©;/^.(e)^fc}(^)| < fe-^{0) where /q |/ 0 ^( 0 )|/i(d 0 ) < cx). Thus apply 
Lebesgue’s dominated convergence theorem. 
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Therefore, for all a G [0,1], there exists k G R+U{cxd} s.t. ^ ^16^ G © ; fe^{d) ^ 

/© /e^(^)l{ 0 e©:/(,. ( 0 )^fc}(^)h'(d 6 ') ^ 1 — a. □ 

Appendix E. Analyses of the approximation error in the Example and 

SOME VARIANTS OF IT 


The central object of study in assessing an applied neoclassical method is the distri¬ 
bution of P o This problem can conceptually be dealt with in the same way as in 

the classical case. We here include some illustrations for deriving aspects of the law of 


P o with respect to the probability measure of the data generating process. General¬ 
izing this investigation parallels the development of classical statistical inference theory. 
We here mean merely to point out that such an investigation is a matter of mathemati¬ 
cal development and sophistication, and is not a conceptual problem for the neoclassical 
framework. 


In subsection E.l we study the distribution of the cumulative distribution function 
and density of P o 9Tp~^ where we restrict attention to the example with i.i.d A/'(6*o, s) 

we study the concentration of P o 9^~^ around P o 9^~^ 


observations. In subsection 


E.2 


when Af{XT, s) is used to approximate the distribution of = E(Ai) when Xi-^t are 
i.i.d. observations not necessarily from a Gaussian distribution. In the Gaussian case, we 
deduce the exact hnite-sample distribution of the distance between P o 9!^~^ and P o 9^~^ 
for the Hellinger and Wasserstein distances. 


E.l. The exact distribution of P o 9^~^. Let Fo^{x,uj) be the cumulative distribution 
function induced by P o Gonsider the basic example with i.i.d. J\f(9o,s) observa¬ 
tions, so that Fq^{x,u) = Tl( 0; 1). Let us identify the distribution of Fe*(a:), 

first when x is fixed, and, for the simpler case when s is assumed known also deal with 
X I—)■ F 0 ^{x) as a stochastic process. 

For a given x, we have that the distribution of the random variable F 0 ^{x,u) is 
known exactly when the observations are i.i.d. N'{9o,s), since clearly Hx{y) := P(a; G 
n : Fg-^{x,u) ^ y) = P(a; G G : ‘^{[x - Xt{uj)]/[st{uj) / Vf]]Q]l) ^ y) = P([a; - 
Xt\/[st/^/T] ^ 0; 1)) where is the inverse function of Tl(x;0;l), 
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i.e. the standard Normal quantile function. It is well known that W := (T — l)s 
Xt -1 is such that (X^, W) are independent. Hence Z := {x — X)/{s/y/T) ~ M{—x/s, 1) 
and W are also independent. We hence see that is the cumulative distribution function 
of a quotient of two independent random variables with a known distribution, and that 
Hx is therefore easy to obtain. Clearly, only depends on T and the ratio x/s. If s is 
assumed known, the distribution is known exactly. In the case of s known, we also note 
that F 0 ^{x) = fh(G(a;);0; 1) where G is the continuous Gaussian process G{x) = VT{x — 
9q)/s + e where e ~ A/'(0,1) and that fe^{x) = {d/dx)FQ^{x) equals \/Tn(G(a;); 0; l)/s. 
Finally, arg maxa,(x) = argmax^. A/Tn(G(a;); 0; l)/s = argmax^, n(G(x); 0; 1) is the so¬ 
lution to G{x) = 0, i.e. we regain the observation that argmax^, fe!^{x) = Oq — {s/VT)e ~ 
J\f{9o,s/VT). 

E.2. A probability bound for p(P o P o 9^~^). A feature of P o 9!^~^ which is of 
special interest is its concentration around P o 9^~^. Consider g = p(P o 6'^“^,P o 9^~^) 
based on some metric p on the space of all probability measures on (0, £^©). As P o 9^~^ is 
data-dependent, clearly p is a random mapping. Studying the law of g directly is generally 
complicated. This problem is however, connected to several well-studied problems in 
classical statistics and probability. We here illustrate some basic issues relating to the 
study of g. 

Let P o 9^^~^ be a distribution that is known to approximate Po^^”^ on some appro¬ 
priate scale. As in the proof of Lemma we use the triangle inequality to see that 

p ^ o + p(pO O -1). (11) 

If P o is well-chosen, this bound can be used to derive bounds for the exceedance 

probabilities of g. We note that this triangle inequality bound is general, but is likely to 
be crude compared to other bounds where more of the structure of the problem is used. 

In the case when \/T{9^ — 9q) —A/'(0 , for some covariance matrix S, the 

T^oo 

natural choice of P o is the distribution N'{9q, Yh/\/T). In this case, we typically 

have that P o 9^~^ has the data-dependent distribution A/'(6*g(a;), {u)/\/T), and we see 
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that the hrst term in the above display quantihes the loss in precision in not knowing the 
population parameters that are in the Normal approximation of the law of 9^. 

The hrst term in the bound of eq. @ compares the distance between two exactly 
Normal distributions, one of which is data-dependent. This is the only stochastic term in 
eq. (0 and bounds, or even exact expressions for comparing the law of two Normals are 
available for several probability metrics as we will illustrate shortly. 

The second term in the bound of eq. (0 quantihes how good this Normal approxi¬ 
mation is to the distribution of 6 ^ if these parameters are known, and is a well-studied 
problem, especially for the Kolmogorov metric where the celebrated Berry-Esseen Theo¬ 
rem applies. 


E.2.1. The i.i.d. Gaussian case. In our example with i.i.d. Af{9Q,s) observations, both 
the distribution of and the approximation 7V(X'7’(a;), g^Ye Gaussian and we 

can work directly with g = p{Af{XT, st/VT), s/VT)). 

When p is the Bellinger or total variation distance, we will see that g does not converge 
to zero, but does converge to zero with the Wasserstein distance. The Wasserstein distance 
is a hner metric than the Prohorov and the Levy metric. If at least one of the probability 
distributions that are compared have a density with respect to Lebesgue measure with 
hnite supremum - which is the case when comparing two Gaussians - the Levy metric 
is in turn equivalent to the Kolmogorov metric (Gibbs and Su, 2002). On the ladder 
of probability metrics, the fact that g is op(l) when p is the Wasserstein metric but 
not when p is the Bellinger metric gives information on how hne-grained the neoclassical 
approximation is in an elementary case. 

Let p be the Bellinger distance, i.e., the metric on the space of probability measures with 

1 /2 

densities with respect to Lebesgue measure given by V 2 ) = [/^(vTi ~ \/~hY dA] 

where /i ,/2 are the densities with respect to Lebesgue measure of respectively 

(Gibbs and Su, 2002). Letting /i, /2 be the densities of M{Xt, st/VT) and M{9q, s/VT) 
respectively, we see that 1 - g‘^/2 = J^y/JJ^dX = ■ h(^T/VT){s/VT) 


slIT+s^lT 


4 si.+s^ 


exp <[ -y [, which shows that p = a/2 ( 1 - 


cxD < ^ \ ^ 

^ I 4s2/T+s2/TJ 

1/2 


exp 


— 

4 


where Zt := \/T{Xt — Oq)/s A/'(0,1). If s is assumed known, we can replace sp with s 
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in the above expression and get g = y/2\/ 1 — which has a distribution that does 

not change with T. In both cases, g does not converge to zero, but is a non-degenerate 
random variable. The same conclusion holds for the total variation metric, since it is 
equivalent to the Hellinger metric (Gibbs and Su, 2002). 

Let p be the Wasserstein distance, i.e., the metric on the space of probability measures 
given by pwiyi)^ 2 ) = /jr ~ F 2 {x)\(ix where Fi,F 2 are the cumulative distribution 

functions h'i,h '2 respectively (Gibbs and Su, 2002). Letting Fi,F 2 be the cumulative dis¬ 
tribution functions of M{Xti st/VT) and A/'( 6 *o, s/VT) respectively, (Givens and Shortt, 

1984, equation (4)) shows 

g = pw{Fi,F2) = ^J\XT-eo\^ + s^^/T + syT-2ssT/T) (12) 

which clearly goes to zero in probability. When s is assumed known, we see that g = 

\Xt — 6 ^ 0 ! = s|A/'(0, 1)|/\/T, which follows a folded Normal distribution, so that Ep = 

T“^/^A/ 2 / 7 r. In both cases, g = Op(T“^/^). 

E.2.2. The general i.i.d. case. Let us now work with the case with i.i.d. observations 
from a distribution with hnite third order moment (, variance and expectation 60 , but 
not necessarily being absolutely continuous with respect to Lebesgue measure. We use 
Xt as a proxy for 9q and, motivated by the Central Limit Theorem, approximate 
with M{Xt{oj), st{oj)/ y/r). Because P o 0*“^ is not Normal, we resort to the bound in 
eq. (In}. For the second term in the bound, we will use the Berry-Esseen bound, which is 
given in terms of the Kolmogorov distance PKiyii ^ 2 ) = sup^ l-Pi(j^) — -^ 2 (t)| where Fi, F 2 
being the c.d.f. induced by z/i, z /2 respectively. 

From eq. ([II|, we see pk{N'{Xt, st/VT),¥ o0?p ^) ^ pi-Fpa where := pk{N'{Xt, st/VT),N'{0o, s/\/T' 
and p 2 := Pa(A/'(6'o, s/-\/T), P o The hrst term compares two Gaussians. Calcula¬ 
tions show that Pi = |Tt(Q; 0; 1) — Tt(stQ/s -|- 0; 1) for Zt = VT{Xt — da)/s where 

Zt = VT{Xt - eo)/s ~ Ar(0,1) and Q = {stZ/s - Z^ + 2(sV4 " 1) log(s/sr))/(l - 
s^/sf,). We hence have an exact expression for pi, and we see it is op(l). We have 
Pa(A/'(6'o,s/v/T),Po 6'^-^) = sup,^gR |91 (a/T(x-6'o)/s; 0; 1)-P(X„ ^a;)| = sup^g^ |91(A/T(a;- 
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6'o)/s;0;l) - P(Z„ ^ ^/f{x - Oo)/s))\ = sup^gjj |9 T(a/Tz; 0; 1) - P(Zt ^ VTz)\ = 

sup^gR 0; 1) — P(Zr ^ z)\ which is bounded by C(s~^/\/T by the Berry-Esseen 
bound for a <C < 0.4748 (Shevtsova, 2011). 

Appendix F. Consistency of approximations 


In this appendix, we investigate the consistency of the approximations presented in 


subsection 5^ As in the main text of this paper, we use the Prokhorov metric, and 
the Lemma 1^ (p. 24). The following Lemma ensures that the condition (b) of Lemma 
1^ is satished, so that it only remains to check the condition (a) of Lemma for each 
approximation. 

Lemma 5. Let pp {.,.) be the Prokhorov metric. Under and A if dp 

as T ^ oo, then pp(P o 0^“^, —)■ 0 as T —)■ oo. 


Proof. Convergence in probability implies convergence in law (e.g., Kallenberg, 1997/2002, 
Lemma 4.7), which, in turn, is equivalent to the convergence w.r.t. pp. □ 

F.l. Consistency of calibration. In this subsection, we study the asymptotic proper¬ 
ties of calibration from a neoclassical point of view under the assumption that calibration 
yields consistent parameter values. This assumption should hold if the calibrated param¬ 
eter value corresponds to estimates from existing empirical studies, or to the minimizer 
of some goodness-of-ht measure (e.g., Gourieroux and Monfort, 1996, sec. 2.1.2.). For 
criterium-adjusted calibration, we require the following properties from the criterium 
function. 


Assumption 9. (a) The criterion function u : x —)■ R_|. equals zero out¬ 
side and 9 h-)■ u{9,9) is Borel measurable for all 0 G ©. (b) For all 9 E &, 

9 HA- u{9,9) is continuous in a neighborhood of 9 q. (c) There exists ri > 0 s.t. 

fs^^PdeBr (0o) ^ where i?ri(^o) denotes a ball in © centered at 9 q with 

radius ri. (d) There exists r 2 > 0 s.t., for all 9 G 5^2(^o); 0 < f^u(9,9)A(d9). 


Assumption [^a) requires the criterion function to be positive and to take zero values 
outside ©^. As explained in Remark 16 on p. 31, Assumption a) (combined with 
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I remain 


Assumption [^d)) allows us to transform criterion functions into a p.d.f., so that we : 
within the elementary framework of section 4.2 Assumption j^b) requires continuity 
of criterion function, which is a standard and relatively mild requirement: continuous 
functions are dense in the set of Borel measurable functions (Kallenberg, 1997/2002, p. 
19, Lemma 1.37). Assumption [^c) allows us to use Lebesgue dominated convergence 
theorem. Compactness of ©, and continuity of the criterion function over would 
ensure Assumption |^c). Assumption |^d), which seems mild, and allows us to normalize 
the criterion function, so that it integrates to one. 


Proposition 5. Denote the Prokhorov metric with pp. Under Assumptions\^ if 

P 

6*0 as T —)■ cx), then 


It 

i) doff) —)■ 0, as T —)■ oo, which, in turn, implies consistency of plain cali¬ 

bration by Lemmas \^and\^- 


ii) under the additional Assumption 


pp 


f p)A(de) / u(9,eo)\(de) \ p 


—)■ 0, as 


/e «(0,eo)A(d0) j 

T —)■ oo, which, in turn, implies consistency of weighted calibration by Lemmas^ 
and\^ 


Proof, i) By Portmanteau theorem, it is sufficient to check the point-wise convergence of 
cumulative distribution functions (c.d.f.) at the continuity points of the limiting c.d.f.: 
for all 6 * G © that are continuity points of l[(?o,oo[(-)) limr^oo l[ 05 ,p,oo[( 6 ') -t l[ 0 o,oo[( 6 '), as 
T —)■ cx) , by assumption. 

ii) Under Assumption (a)-(c), by the Lebesgue dominated convergence theorem, for 
all B G £@, as T —)■ oo, 

[ ^ [ u{9,9o)X{d9). 

Jb ’ J B 

Therefore, by Assumption^ (d), for all if £ E&. as T —1 oo, 
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where the LHS and RHS are probability measures on (©,£^©) by Assumption]^ (a) and 
standard properties of the Lebesgue integral. Then the result follows by the Portmanteau 
theorem. □ 


F.2. Consistency of Ganssian and Laplace approximations. In this subsection, 
we show that asymptotic normality of P o implies consistency of P o under 
mild assumptions, i.e., asymptotic normality of P o implies condition (a) of Lemma 
Then, from this general result, we deduce consistency of the Gaussian and Laplace 
approximations. 

F.2.1. Consistency from asymptotic normality. It is is well-known that asymptotic nor¬ 
mality implies consistency for random variables. To the best of our knowledge, there is 
no result available for random probability measures. We require the following two mild 

to prove this result for P o 0^“^. Assumption 
can study the convergence of P o on the space of probability measures. 


10 


ensures that we 


Assumptions 


10 


and 


0 


Assumption 10. The approximation of the distribution of the generic proxy is a random 
probability measure from to 0 for T big enough P-a.s., i.e., for T big enough, (a) 
[measurability condition] for all B G T©, Xi:oo ^ Fo is S^/B-measurable; and 

(b) [probability condition] P o 9^~^ is a probability measure P-a.s. 


Assumption 10 requires that we can approximate the distribution of 9^ P-a.s. in a 


measurable way. The following Assumption IT allows us to consider the Cholesky decom¬ 
position of the proxy of the asymptotic variance-covariance matrix, and its inverse. 


Assumption 11. St ^ S, where T is a positive-definite matrix, and (ST)re[i,ooi « 
seguence of a{Xi.oo)/B{[Bf)-measurable matrices that are symmetric w.p.a.l, asT ^ oo. 


Remark 22. S does not need to be S, which is the asymptotic variance of \/T{9^ 
see Assumption on p. 23 



The following Lemma is the main result of this subsection. 
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Lemma 6. Denote the c.d.f. of ¥ o 6^ ^ with Fg^, i.e., 
P o — 00 , 6 ']). Under Assumptions 


10. and 




for all 9 e ©, Fg:^i9) = 
as T ^ oo, 


~ p ~ 

(a) 9t —!■ 9o, where 9 t is a{Xi,oo)/U@-measurable, 

(b) for all b G W, Fg.^ (9 t + 4 m{b-, 0; 


then, as T —)■ oo, 


'0 


Pp ( Fg^, dgo) -)■ Oj^ 
which, in turn, implies consistency ofFo9^~^ by Lemmas 2 


Proof. The idea of the proof is to map probability measures to random variables, and 
then note that asymptotic normality trivially implies consistency for random variables. 
By the upcoming Lemma for T big enough, there exists a random variable Zt ■ x 
&,Sn S&) —)■ (©,£^©) s.t. for all 9 G R^, 


nZT^9\X,.,^) = Fg.J9), 


(13) 


where P is a probability measure dehned by equation (15) in Lemma 7 If a sequence of 


symmetric matrices converges to a positive-dehnite matrix, the matrices of the sequence 
positive-dehnite for an index big enough]^ Thus, by Assumption 


are 


11 


Ht is a positive- 


dehnite matrix w.p.a.l as T —)■ oo, which implies that it has a Cholesky decomposition 
w.p.a.l as T —oo. Therefore, for T big enough, dehne 


Yt := 


(14) 


®In this statement, we identify the c.d.f. with its corresponding probability measure. Hereafter, we follow 
the same abuse of notation. 

''’By Corollary III.2 .6 in Bhatia (1997), maxjg|i p| |eigj(S 7 ’) — eigj(S)| < ||Et —^|| where eigj(E) denotes 
the eigenvalues of E in descending order. Now, all the eigenvalues of E are strictly positive, so that the 
eigenvalues of E-p are strictly positive w.p.a.l, as T —> cx). 
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SO that, for all y G R^, 


P(yT ^ y\X^:oo) = - 0t) ^ y\X^:oo) 


^ y 


P (R I ^ 1 
- t<0>^[yT + ^T-^> 

(b) 

A 0; S 2 ) as T —)■ cx) 


(a) By the (j(Xi:oo)-nieasurability of 6t and (see condition (a), and Assumption 10), 


disintegration of P w.r.t. Xi:oo allows us to regard 9t and St as hxed (e.g., Kallenberg, 
1997/2002, Theorem 6.4), and thus to use equation ( [I^ . (b) Use condition (b). 

Now, on one hand, convergence in probability is equivalent to convergence in for 
uniformly integrable sequences (e.g., Kallenberg, 1997/2002, Proposition 4.12). On the 
other hand, P(1t ^ 2 /|-^i:oo) ^ 1, which implies that it is uniformly integrable. Therefore, 
as T —)■ CX), 


E 


P(yT^|/|Xi,oo)-9I(?/;0;S^) 


4 0 


(0 


|p(Kt^i/)-^( 2/;0;S^)| 4o 


^ Ur 4 A7(0, S) , under P 

(c) It 


Vt 

(d) Tt 


— Op 

+ Ot 


^00 


Vf 

Zt 4 6*0 , under P 
^ P(Zt ^ 6*) — l[0o,oo[(6*) at any continuity point 9 of 1 [ 0 q,oo[(-) 

^ A* (^) ^ P{Zt ^ ^|Vi:co) A l|,jo,co[(^) at any continuity point 0 of l|eo,co[(0 

SppfAy.iSO Ao 
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^ VI 

(a) By iterated conditioning and positivity of probabilities, |P(1 t ^ y) — 0; Sa)] = 

]E|P(1t ^ 2/|-^i;oo)| - ]E|9d(|/; 0; E5)| ^ E F(Yt ^ y\Xi,^) -9T(|/;0;S5) , where the in¬ 
equality follows from the reverse triangle inequality for the norm, (b)-(c) Portmanteau 
theorem, (d) By condition (a), 9t —t Oq, as T —)■ cx), which implies that 9t —)■ 9o, as 
T —)■ cx) by the upcoming Lemma [^iii). (e) Dehnition of Yt by equation (HI. C/Mppiy 
portmanteau theorem, (g) By consistency adequacy (see Remark 18 in Appendix [A| on 
p. 43, and note that P and 9t can take the place of P and 9^, respectively, in the state¬ 
ment of Proposition]^, P(Zt ^ 9\Xi.^) 4- 1[0q_oo[( 6'), where F{Zt ^ 9\Xi.^) = Fe* (0) by 


equation {13).(h) Lemma [7j(iii), Fg^{9) —)■ l[6io,oo[(^) at continuity point of l[0g,oo[(-)- Then, 
note that the converge w.r.t. the Prokhorov metric corresponds to the convergence in 
law, which, in turn, is equivalent to the convergence of the c.d.f. at the continuity points 
of the limiting c.d.f. □ 


Lemma 7. Let k : x —)■ [0,1] be a random probability measure from to ©. 

Under As sumptions^ and^ a) (b), there exist 


i) a probability measure P on (f2 x ® £@) s.t., for all A G ® £&, 

P(A)=E [ U{.,9)K{X,.,^,d9), (15) 

J& 

ii) a random vector Z : {fl X &, Sfi Z) S&) —)■ (©, £&) s.t., for all B G £&, 

P(Z’ G i?|Xi.oo) = k{Xi.oo, B) P — a.s., 

p 

iii) and, for all random sequences iWT)T=iJ K, as T ^ oo, where K is a 

P 

constant vector, is equivalent to Wt —> K , as T —)■ cx. 
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Proof. i)-ii) It corresponds to Lemma 6.9 in Kallenberg (1997/2002). iii) For all neigh¬ 
borhoods Nk of K, 

^Wt e Nk) = P{(n;, 0) G X © : Wt{uj) G Nk} 


— E / l{u]£{i:WTG)£NK}x&{-y^)^{d(i:oo, 

J& 


— Pl{a;ef2:VLTM6A;K}(-) 

[ K{x^,^,de) 


J& 

— Pl{a;ef2:lTTHeA;K}(-) 

= P(1Ft G Nk) 


/a/DehnitionofPgivenbyeqnation (15). /&/For alia; G and 6^ G ©, l{a;eri:iyT(w)eA;x}x©(d;, 6^) 


l{ajef2:VLT(t^)6A;K}('^)- ((^) dehnition of the Lebesgue integral for step fnnctions, k(Xi.oo, dd) 
f^(Ni:oo, ©), and then k(Xi:oo, 0) = 1 becanse, for all xi:oo G k(xi:oo, ■) is a probability 
measnre on ( 0 ,£^©). 

IP P 

Therefore, Wt —>■ iP, as T —)■ oo, is eqnivalent to Wt —> iF, as T —)■ oo. □ 

F.2.2. Consistency of Gaussian approximations. 

Proposition 6. Let^l he a positive-definite matrix, and (St)t=i o sequence ofa{Xi:oo)/B(Il^^)- 
measurable matrices. Under ylssrtmpfzons and 6 , if, as T —)■ cx), St ^ S where S 
is a positive-definite matrix, then, as T ^ oo, 

(<Yy ( a* diag(ST)^ ^ P n 

Pp I 91 I .; OrpQ, ---^=- 1 , 1 -)■ 0, 

which, in turn, implies consistency of the Gaussian approximation by Lemmas and 


Proof. Check assnmptions of Lemma l^with F,^(.) = 9l(.;0^,G;diag(ST)VVT), St = 
diag(ST), and for S = diag(S), and then apply it. Assnmption 10 is verihed becanse 
Fe-(.) = 9l(.;05.^G;diag(ST)VVT) = 91 (yTdiag(ST)-^ (• - 0,/) where 6*^ and 

St are measnrable by dehnition, and 91 (.;0,/) is continnous. Assnmption [ll| is verihed 
because a diagonal matrix is symmetric, and S is positive-dehnite by assumption. As¬ 
sumption (a) of Lemma is verihed by Assumption Assumption (b) of Lemma is 
verihed because, by the change of variable u = ■\/Tdiag(ST)~^(6' — 0}^q), for all b G R^, 
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or 


^ t,g + diag(ST)^^;6'r,G;diag(ST)VVT) 

1(0 - 0^,^)'diag(Sr)-H^ - A(d0) 


A/27rTP|diag(ST)|dBt 
= ^ /-oo [- 2 “'“] 0; !)• 


□ 


F. 2 . 3 . Consistency of Laplace approximations. In this subsection, under mild technical 
assumptions, we deduce the consistency of weighted and criterion-adjusted Laplace ap¬ 
proximations from the Bernstein-von Mises theorem, i.e.. Assumption [T^c). 


Assumption 12. (a)For T big enough, w{.) ^ 0 P-a.s. (b) For T big enough, 0 < 
/©< 00 P-a.s. (c) [Bernstein-von Mises theorem] For all b G R^, 
as T ^ 00 , 


Fe-. I Olfr + 




4 91(6; 0;E2) 


e 


where Fg.^{e) := /_ 
For all Xi:oo G 




°° f0e'^^T(Xi:T’<))w{e)X{de) 

e^QTC^:T,-)w{.) ^S S&/B{R 


^A(0), and 61] G argmax^gg, 
measurable. 


,TQTiXi:T,9) 


(d) 


Assumption f^c) has been established under general assumptions (e.g., Le Cam, 1953, 


1958; Chen, 1985; Kim, 1998; Chernozhukov and Hong, 2003). Assumption 12 (a)(b) 
ensures that Fg* (.) is a c.d.f. P-a.s. for T big enough. For the consistency of the criterion- 


adjusted weighted Laplace approximation, we also require the following Assumption 13 


Assumption 12 d) is a weak requirement that ensures the existence of random variables 
with a distribution specihed through Fg» (.). 


Assumption 13. (a) For all6 &&, 6 ^ u{9,9) is continuous and bounded. 

(b) There exists a function h : © —R s.L, for all 0 G ©, forT big enough, u(9, Q'jo^QTiXi-.T 
h{6) P-a.s., and 4h(0)A(d0) < 00 . 


Assumption [I^a) typically follows from the continuity of u{., .), and the compactness 
of 0, while Assumption [l^b) typically follows from the assumptions on Qt{Xi.t, 9), that 
are needed to establish Assumption [T^c). 


Proposition 7. Under Assumptions 


111 and 


12, if, as T ^ 00 , 9]j^ —)■ 9q, then 


’®4(0)A(d0) ■ 












ECONOMETRIC INFERENCE AND MULTIPLE USE OE THE SAME DATA 


63 


i) pp A 0, as T —)■ oo, which, in turn, implies consistency of the 

weighted Laplace approximation by Lemmas \^and\^- 

ii) under the additional Assumptions^a) and 13. as T ^ oo, 

^ \f^ 2 n(e,e)eTQT(Xi-.T.e)y,{e)x{de)x{d 9 ) ’ J^u{e,eo)\{de) J ^ wmcn, m lurn, implies 
consistency of the criterion-adjusted weighted Laplace approximation by Lemmas 

d and\^ 


Proof, i) Check assumptions of Lemma with Up = h, and for S = /, and then apply 
it. Assumption 10 is verified because, under Assumptions [l^a) (b)(d), by the upcoming 


Lemma 


for T big enough, P o Otf 


-1 


:=/. 


X{6) defines a random 
are 


T,WL ■ J. p^prQTi>^i-.T^9)w{e)\{de) 

probability measure from to (&,Ps) ■ Assumptions (a) and (b) of Lemma 
verihed by assumption of Proposition and Assumption [T^c), respectively. 

ii) By i), as T —)• oo, Pp A 0. Thus, by definition of the convergence in 

law and Assumption f^a), for all 0 G ©, as T —)■ oo. 


f .. IQ) .. 

/ TOiX I 


(g) 


r r .. Q^^T{Xi.T,e)yjiQ) 

WBe£&, / / u{9,9)- -^^-M—^ 

JbJ& ^ J^e^QT{x,..TP)^,^0)x^d9) 


\{d9)X{d9) 4 / M(0,0o)A(d0) 


IB 


LJeU(9.9)e^g"l-^'-"-‘'V'(9)A(d9)A(d9) , u(D, D„)MM) 
®’ /g,ti(9,9)e’’Or(.^i:T,<>)TO(9)A(d9)A(d9) /gti(9,9o)A(d9) 

(.) . f ile«(», 


Pp 


f^2u(9,9)e^QT(x^:TP)uJ(Q)X(d9)X(d9) ’ f^u(9,9o)X(d9) 1 


(a) By Assumption [T^b), apply Lebesgue dominated convergence theorem, (b) Note 

that simplification of the denominators yields Ae i.t )w{ 0 )\(de)\(de) 

/g,2 «(e,0)e^'3T{M:T.9)u,(e)A(de)A(de) 


/s /® u{0,6) 


prQrXd-.T fi) w(9) 

/© e^‘ 3 T{-^l:T' 9 )u,(e)A{de) 


A(de)A(d0) 


Ie>2uid,0)- 


prQTXi-.T’^)m{e) 


-\{d0)\{d0) 


-, and then apply twice the previous line, with B = 


/e e^' 3 T(''^l:T.®)m( 0 )A(de) 

B and B = &. (c) Apply Portmanteau theorem. 


□ 


Lemma 8. If, for all Xpoo G a function g : x © — )■ R is £&/ -measurable on 

©, then, for all A G £&, Xpoo f^glxpoo, 9)X(d9) is doo/m) -measurable. 
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Proof. Let A ^ Define for this proof 


' 

y-A := U(.,.) 


h : X © -> R 

V6* G G,h{., 6 ) is 5o^/i3(R)-measurable *■ 

Ti:oo t h(xi:oo, 0)^(dd) is 5o^/i3(R)-measurable 


Check the assumptions of a functional form of Sierspinki monotone class theorem(e.g., 
Florens, Mouchart and Rolin, 1990, Theorem 0.2.21). First, PL a is a R-vector space be¬ 
cause measurability is preserved by linear combinations, and because the integral of a 
linear combination of functions is the linear combination of the integrals of the respective 
functions. Second, PLa contains the constant function 1. Third, if (h„(.))„^i is a sequence 
of non-negative functions in PLa such that h„(.) f h{.) where h{.) is a bounded function 
on ©, then h{.) G PLa by preservation of measurability under limit and the Lebesgue 
monotone convergence theorem. Fourth, PLa contains the indicator function of every set 
in the vr-system consisting of measurable rectangles, X := |i?n©;i?:=.Rn© with 
R = 11^=1 where (oj, hi) G R^ A a* ^ (an intersection of two measurable rectan¬ 
gles is a measurable rectangle) because, for all i? G X, (i) V6* G ©, Xi-^^o '-A 1_r( 6') is 
5o^/i3(R)-measurable, (ii) and for all A G X©, xi:oo e-)■ lj:j(6*)A(d6*) = A(R fl i?) < 
A(©) < cx) is iSo^/i3(R)-measurable. 

As cr(I) = X©, by a functional form of Sierspinki monotone class, if a function g (.,.) 
is X©/i3(R)-measurable, g G Ra, and thus Xi:oo e-)■ f^g(xi:oo, 0)A(d0) is Xo^/i3(R)- 
measurable. □ 


Appendix G. Notations and overview of some practices 



ECONOMETRIC INFERENCE AND MULTIPLE USE OE THE SAME DATA 


65 
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